Dargel, Lukas, and Christine Thomas-Agnan (2024). Pairwise share ratio interpretations of compositional regression models. Computational Statistics & Data Analysis, Volume 195, July 2024, 107945 Accessed 08 Mar 2024. Online at https://doi.org/10.1016/j.csda.2024.107945
The vignette contains all the code necessary to reproduce the article’s results and provides additional illustrations.
Code
# cran dependencieslibrary("colorspace")library("CoDaImpact")library("compositions")library("data.table")library("here")library("ggplot2")library("kableExtra")library("skimr")library("stringr")library("zCompositions")# local functionssave_baseplot <-function(device, ..., which){suppressMessages({dev.copy(device, ..., which)dev.off()invisible(TRUE) })}# optionsoptions(knitr.kable.NA ='',scipen =9,warn =1)# datamun_elec2census <-readRDS(here("out/data/mun_elec2census.Rds")) # election data combined with the censusmun_elec <-readRDS(here("in/data/mun_elec.Rds")) # original election data
1 Introduction
We illustrate the use of compositional data analysis (CoDa) to analyze the results of political elections using the example of the French presidential elections of 2022. Our primary focus is the illustration of mathematical tools that enable new ways to interpret CoDa regression models. Therefore, other issues, such as variable selection and zeros handling, are treated more briefly.
2 Descriptive statistics
The study combines the official election results with census data:
Let us have a first look at the combined data set.
Code
skim(mun_elec2census)
Data summary
Name
mun_elec2census
Number of rows
34820
Number of columns
28
Key
ID_MUN
_______________________
Column type frequency:
character
3
numeric
25
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
ID_MUN
0
1
5
5
0
34820
0
NAME_MUN
0
1
1
45
0
32597
0
NAME_DEP
0
1
3
23
0
96
0
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
ABSTENTIONS
0
1
310.15
2230.35
0.00
31.00
70.00
181.00
296668.00
▇▁▁▁▁
BLANK
0
1
14.97
80.08
0.00
2.00
5.00
12.00
11028.00
▇▁▁▁▁
INVALID
0
1
6.48
38.74
0.00
1.00
2.00
5.00
5267.00
▇▁▁▁▁
ARTHAUD_Nathalie
0
1
5.38
23.20
0.00
0.00
2.00
5.00
2891.00
▇▁▁▁▁
ROUSSEL_Fabien
0
1
22.78
126.89
0.00
2.00
6.00
17.00
17267.00
▇▁▁▁▁
MACRON_Emmanuel
0
1
269.64
2294.94
0.00
29.00
68.00
178.00
372820.00
▇▁▁▁▁
LASSALLE_Jean
0
1
31.17
107.58
0.00
6.00
13.00
29.00
12139.00
▇▁▁▁▁
LE_PEN_Marine
0
1
227.81
777.14
0.00
36.00
83.00
197.00
72772.00
▇▁▁▁▁
ZEMMOUR_Eric
0
1
69.17
581.52
0.00
8.00
18.00
46.00
86088.00
▇▁▁▁▁
MELENCHON_Jean_Luc
0
1
208.91
2104.10
0.00
18.00
43.00
113.00
317372.00
▇▁▁▁▁
HIDALGO_Anne
0
1
16.92
144.04
0.00
1.00
4.00
12.00
22901.00
▇▁▁▁▁
JADOT_Yannick
0
1
45.07
495.50
0.00
3.00
9.00
27.00
80268.00
▇▁▁▁▁
PECRESSE_Valerie
0
1
46.67
419.01
0.00
6.00
13.00
32.00
69564.00
▇▁▁▁▁
POUTOU_Philippe
0
1
7.45
41.47
0.00
1.00
2.00
6.00
5732.00
▇▁▁▁▁
DUPONT_AIGNAN_Nicolas
0
1
20.05
81.22
0.00
3.00
7.00
17.00
9591.00
▇▁▁▁▁
C19_POP15P_CS1
0
1
11.82
17.86
0.00
0.00
5.17
15.08
536.25
▇▁▁▁▁
C19_POP15P_CS2
0
1
54.18
436.78
0.00
5.00
15.13
40.36
68343.41
▇▁▁▁▁
C19_POP15P_CS3
0
1
148.16
3186.69
0.00
5.00
19.83
55.70
558207.31
▇▁▁▁▁
C19_POP15P_CS4
0
1
218.08
1845.06
0.00
19.44
49.77
135.00
269298.92
▇▁▁▁▁
C19_POP15P_CS5
0
1
245.49
1743.72
0.00
21.78
56.31
149.89
225097.04
▇▁▁▁▁
C19_POP15P_CS6
0
1
185.08
864.01
0.00
20.61
55.00
135.00
78049.73
▇▁▁▁▁
C19_POP15P_CS7
0
1
418.71
2564.74
0.00
50.00
110.99
281.25
345027.83
▇▁▁▁▁
C19_POP15P_CS8
0
1
255.37
2551.22
0.00
18.20
44.66
116.70
327076.11
▇▁▁▁▁
AREA
0
1
15.76
17.32
0.03
6.51
10.96
18.94
757.42
▇▁▁▁▁
POPULATION
0
1
1862.19
15018.55
1.00
197.00
454.00
1147.00
2175601.00
▇▁▁▁▁
2.2 Compare losses due to preprocessing
We lose some of the registered voters in France because the census data only covers individuals living in mainland France. Our analysis does not cover the overseas territories and voters living in a foreign country. Let us compare the original and combined voting data to assess the losses and potential biases.
Because CoDa models cannot effectively handle cases where dependent or independent compositional variables have zero proportions, it is necessary to address the zeros issue before estimating the models. To address this, we will initially employ amalgamations and impute the remaining zero values by a small constant. The amalgamation process combines various professional categories or political candidates into larger blocks. However, it is important to note that our proposed grouping is ad hoc, and a case study of electoral sociology should certainly place more thought into these actions.
2.3.1 In the vote data
We first check for zeros in the dependent variables.
To reduce the number of zero votes, we group the minor candidates into a “left” and a “right” block, each containing four candidates. Additionally, we group invalid votes, blank votes and abstentions into the “No vote” block. After these amalgamations, we obtain a visual impression of the zero patterns in the new composition using the zCompositions::zPatterns() function. The graph reveals that almost \(98.5\%\) of the municipalities have non-zero compositions now.
The remaining zeros are imputed by simply adding one vote to all candidates (or blocks) in each municipality with at least one zero. This imputation affected 464 municipalities, adding an overall number of 2784 fictive voters.
We compare the data distribution in the original and imputed data sets to ensure that the imputation does not distort our analysis. Comparing the two distributions below reveals no noticeable changes in the mean but a slight increase in the variance. The effect on the variance is not surprising since our imputation method replaces zeros (red bars in the graphic) with points close to the edge of the triangle, which are extreme values in the simplex sense.
Code
simplex_isodensity <-function( data,quantiles =c(0.5,1:9,9.5)/10,labels =names(cen),col1 ="black",col2 ="grey45",plot_data =TRUE) {# code adapted from:# Van den Boogaart and Tolosana-Delgado (2013), page 52stopifnot(ncol(data) ==3) cen <-mean(data) var <-var(data)if (plot_data) plot(data, pch =".", col ="grey85", labels = labels, mp =NULL, lenMissingTck =0.02)plot(cen,pch=4, col = col1, labels = labels, add = plot_data)for (p in quantiles) { r =sqrt(qchisq(p=p,df=2))ellipses(cen,var,r, col=col2) }}
The professional categories data directly comes from the French population census provided by the INSEE. It categorizes the population above 15 into one of eight groups. The categories correspond to
Since the number of municipalities having zero share of some of these categories is too large, we amalgamate the eight original groups into four broader categories. Using the new categorization, about \(95\%\) of the municipalities have no zero value in all components.
Using the previous strategy, we add one individual to all PCs in each municipality with at least one zero component. Overall, this imputation affected 1681 municipalities, adding a total of 6724 fictive individuals.
We proceed to compare the distribution of the PC compositions before and after our imputation process. In the graphics below, the mean seems unaffected, while the variance grows more substantially than in the case of the VOTE composition.
Since election data corresponds to aggregations of individual choices, we should suspect heteroskedasticity. The scatter plot below shows that the number of voters directly impacts the turnout rate. Additionally, its volatility decreases with the size of the municipality in terms of the number of voters.
# After the imputations, we decide on the data for modelingmun2model <-data.frame(ID_MUN = mun_elec2census$ID_MUN,REGISTERED =rowSums(as.matrix(VOTE_IMP)),REGISTERED_G =Reduce("*", lapply(VOTE_IMP, as.numeric)),POP_DENSITY = mun_elec2census$POPULATION / mun_elec2census$AREA)mun2model[["VOTE"]] <-as.matrix(VOTE_IMP)/mun2model$REGISTEREDmun2model[["PROFCAT"]] <-as.matrix(PROFCAT_IMP)/rowSums(PROFCAT_IMP)
4 Understanding “linear” changes in compositional variables
In this section, we illustrate how compositional variables change based on the PC composition in Paris. Below, we have the observed composition for the Paris municipality.
Upper Workers Retired Other
[1,] 0.4421546 0.1619779 0.1843559 0.2115117
In the following subsections, take the composition of Paris as a point of departure and add a sequence of linear increments, where linearity refers to the simplex geometry. It should be evident that a linear increment of any multivariate vector is defined by a direction and a step size \(h=0.1\). Since we normalized the direction to unit length \(h\) reflects the Aitchison distance between two sequential points.
4.1 Changing in direction of the first vertex
Code
paris_pcseq1 <-CoDa_path(comp_direc =c(2,1,1,1),comp_from = paris$PROFCAT,add_opposite =TRUE,step_size = .1) # take the same step size as before
We consider the direction that points to the first vertex of the simplex, in our example, the share of “Upper”. The two ternary diagrams below reveal how the PC composition changes along the path defined by this direction. The initial composition observed for Paris is shown as a red cross. One particularity of this direction is that the subcomposition excluding “Other” remains constant along the path.
Code
opar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq1[,-4])plot.acomp(paris_pcseq1["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(paris_pcseq1[,-1])plot.acomp(paris_pcseq1["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)save_baseplot(pdf, here("out/figures/","Xe_change_ternary.pdf"), width =7, height =3)par(opar)
Both graphics above make clear that the ratios between all components except “Upper” remain constant. This also explains why the path in the left ternary plot appears linear in the Euclidean sense. Below, we represent the same path as a stacked bar plot, where the ratio of bar heights remains constant except when “Upper” is involved.
Code
names(paris_pcseq1) <-colnames(PROFCAT)window <-as.character(seq(-40,40,by =2))barplot(acomp(paris_pcseq1[window,,drop=FALSE]), xlab ="value of h")save_baseplot(pdf, here("out/figures","Xe_change_barplot.pdf"))
Another way to look at the changing composition is to present the changes to all components as a function of the share of “Upper”. In this graphic, we additionally display the empirical distribution of the share of “Upper” over all municipalities. The vertical line intersects the other in the observed composition for Paris, showing that Paris is an extreme observation in the “Upper” dimension.
Code
paris_pcseq1_gg <-data.frame(paris_pcseq1)colnames(paris_pcseq1_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq1_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()ggsave(here("out/figures/Xe_change_scatter_with_details.pdf"), height =6, width =6)
4.2 Changing in a general direction
We repeat the same linear increment procedure for a more general direction. Here, we chose a path for which the share of “Upper” passes through the whole interval \((0,1)\), because we want to show graphics comparable to the previous ones.
Code
dir_gen <-c(.1,.5,.25,.15)paris_pcseq2 <-CoDa_path(comp_direc = dir_gen,comp_from = paris$PROFCAT,add_opposite =TRUE,step_size =attr(paris_pcseq1,"step_size")) # take the same step size as before
An alternative direction is the one that points from the observed point for Paris to the origin of the simplex. The two ternary diagrams below show a path generated along this direction, where the red cross is the empirical composition for Paris and the blue star is the origin.
Code
opar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq2[,-4])plot.acomp(paris_pcseq2["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(paris_pcseq2[,-1])plot.acomp(paris_pcseq2["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)save_baseplot(pdf, here("out/figures/","Xu_change_ternary.pdf"), width =7, height =3, bg ="transparent")par(opar)
The simplex-linear path this direction generates looks more complex than the previous one because it is non-linear in the Euclidean sense. For this path, the ratios of the other components are no longer constant, and the changes are generally not monotonic. The lack of monotonicity is evident from the shares of “Other” and “Retired” that peak close to the center of our variation and decline towards both edges.
Code
names(paris_pcseq2) <-colnames(PROFCAT)window <-as.character(seq(-80,80,by =4))barplot(acomp(paris_pcseq2[window,,drop=FALSE]), xlab ="value of h")
The intersection with the black line is again the empirical composition of Paris.
Code
paris_pcseq2_gg <-data.frame(paris_pcseq2)colnames(paris_pcseq2_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq2_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()ggsave(here("out/figures/Xu_change_scatter_with_details.pdf"), height =6, width =6)
4.3 Changing in direction of the origin
If the direction and the starting point coincide, the path will pass through the origin of the simplex.
Code
paris_pcseq3 <-CoDa_path(comp_direc = paris$PROFCAT,comp_from = paris$PROFCAT,add_opposite =TRUE,step_size =attr(paris_pcseq1,"step_size")) # take the same step size as before
The two ternary diagrams below show a path generated along this direction, where the red cross is the empirical composition for Paris and the blue star is the origin.
Code
opar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq3[,-4])plot.acomp(paris_pcseq3["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)plot.acomp(paris_pcseq3[,-1])plot.acomp(paris_pcseq3["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)par(opar)
The simplex-linear path generated by this direction is again non-linear in the Euclidean sense and also leads to non-monotonic changes in the shares of “Other” and “Retired”.
Code
names(paris_pcseq3) <-colnames(PROFCAT)window <-as.character(seq(-40,40,by =2))barplot(acomp(paris_pcseq3[window,,drop=FALSE]), xlab ="value of h")
When looking at the functional representation of this linear path, we see that the shares for each professional category cross at the origin of the 4-dimensional simplex. The intersection with the black line is again the empirical composition of Paris.
Code
paris_pcseq3_gg <-data.frame(paris_pcseq3)colnames(paris_pcseq3_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq3_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_vline(xintercept =1/4, col ="grey55") +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()
4.4 Changing in the relative compensation direction
Another interesting direction compensates for the relative increase in one component by decreasing another component. Below, we illustrate this movement for the pair “Upper” and “Workers”, where we see that the share of “Upper” moves towards its summit when \(h\) increases and the share of “Worker moves away from the summit.
Code
paris_pcseq4 <-CoDa_path(comp_direc =exp(c(1,-1,0,0)),comp_from = paris$PROFCAT,add_opposite =FALSE, # only draw the line for one sidestep_size = .1) opar <-par(mfrow =c(1,3),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq4[,-4])plot.acomp(paris_pcseq4["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)plot.acomp(paris_pcseq4[,-1])plot.acomp(paris_pcseq4["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)plot.acomp(paris_pcseq4[,-2])plot.acomp(paris_pcseq4["0",-2, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)par(opar)
One interesting aspect of this path is that it leads to an axial symmetry in the shares of the components “Other” and “Retired”.
The shares of the components “Other” and “Retired” are maximized at the point where those of “Upper” and “Workers” become equal.
Code
equal_point <-which.min(abs(paris_pcseq4$Workers - paris_pcseq4$Upper))focus_seq <-seq(-60,60)plot(x = focus_seq,y = paris_pcseq4$Other[focus_seq + equal_point], type ="l", col ="blue", xlab =NA, ylab =NA)lines(x = focus_seq,y = paris_pcseq4$Retired[focus_seq + equal_point], col ="red")title(xlab ='steps around the point of equality in "Upper" and "Workers"',ylab ='shares of "Other" and "Retired"')
Another interesting point is that in clr space, the components “Other” and “Retired” are unaffected along this path.
The table below illustrates how the dependent variable TURNOUT would react to a change in the composition of the professional categories, where we focus on the municipality of Paris.
Code
format_VarTab <-function(VT, digits =4, digitsU =max(digits -2, 0), digitsP = digits, digitsPP = digits) {# Print variation tables in a nicer way VT <-t(VT) VT <-as.data.frame(round(VT, digits)) v <-"Variation in units"if (!is.null( VT[[v]])) VT[[v]] <-round(VT[[v]], digitsU) v <-"Variation in %"if (!is.null( VT[[v]])) VT[[v]] <-paste0(round(VT[[v]], digitsP), " %") v <-"Variation in % points"if (!is.null( VT[[v]])) VT[[v]] <-paste0(round(VT[[v]], digitsPP), " %")t(VT)}
Code
paris_num <-which(mun2model$ID_MUN == paris_id)VariationTable4PCparis <-function(Xdir) VariationTable(object = TURNOUT_model,Xvar ="PROFCAT",Xdir = Xdir,inc_size = .1,obs = paris_num,normalize_Xdir =FALSE)paris_varTab <-cbind(VariationTable4PCparis("Upper"),VariationTable4PCparis(dir_gen))row.names(paris_varTab)[4] <-"Variation in % points"paris_varTab[4,] <- paris_varTab[5,] *100paris_varTab <- paris_varTab[-5,]tab_atr <-attributes(VariationTable4PCparis("Upper"))colnames(paris_varTab) <-c("Direction to vertex", "General direction")format_VarTab(paris_varTab, 3,digitsP =3, digitsPP =3) |>kable(caption ="Variation of the turnout when changing PC in different directions", digits =4, booktabs =TRUE) |>add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2)))
Variation of the turnout when changing PC in different directions
We now evaluate, for Paris, how changes in the PC composition would impact the vote shares of each candidate. In the following, we consider scenarios of changes defined by the linear paths illustrated in the previous section. The graphic below shows that all scenarios pass through the observed PC composition for Paris and lead to a predicted turnout of about \(72.9\%\). In the hypothetical scenario where Paris has shares in each professional category, the turnout is predicted to drop to about \(67\%\).
For the first two scenarios, the same graph becomes:
Code
ggplot(paris_pred_gg[DIR !="Origin direction",]) +geom_vline(aes(xintercept = INT), data = hline[1:2,]) +geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data = hline[1:2,]) +geom_line(aes(y = value, x = PRED_TURNOUT, col =variable, lty = variable)) +scale_y_continuous(breaks =seq(0,10)/10,labels = scales::percent) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name =NULL), limits =c(.5,.9)) +theme_bw() +coord_flip() +labs(y ="share in professional category", x ="predicted turnout",col ="professional\ncategory", lty ="professional\ncategory") +facet_wrap("DIR", ncol =1, scales ="free_x")
An alternative and maybe more straightforward way to show the same information is to look simultaneously at the changes of the dependent and independent variables as a function of \(h\).
Response ilr(VOTE)1 :
Call:
lmCoDa(formula = `ilr(VOTE)1` ~ log(POP_DENSITY) + ilr(PROFCAT),
data = mun2model)
Residuals:
Min 1Q Median 3Q Max
-2.95546 -0.21157 0.00821 0.22060 2.38820
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.213028 0.006649 32.040 <2e-16 ***
log(POP_DENSITY) -0.043064 0.001518 -28.377 <2e-16 ***
ilr(PROFCAT)1 0.203781 0.003510 58.063 <2e-16 ***
ilr(PROFCAT)2 -0.115172 0.003773 -30.528 <2e-16 ***
ilr(PROFCAT)3 -0.009089 0.003933 -2.311 0.0208 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3521 on 34815 degrees of freedom
Multiple R-squared: 0.1209, Adjusted R-squared: 0.1208
F-statistic: 1197 on 4 and 34815 DF, p-value: < 2.2e-16
Response ilr(VOTE)2 :
Call:
lmCoDa(formula = `ilr(VOTE)2` ~ log(POP_DENSITY) + ilr(PROFCAT),
data = mun2model)
Residuals:
Min 1Q Median 3Q Max
-2.5772 -0.2540 -0.0054 0.2498 3.7606
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.528380 0.008809 -59.98 <2e-16 ***
log(POP_DENSITY) 0.034209 0.002010 17.02 <2e-16 ***
ilr(PROFCAT)1 -0.145239 0.004650 -31.24 <2e-16 ***
ilr(PROFCAT)2 0.074919 0.004998 14.99 <2e-16 ***
ilr(PROFCAT)3 0.062590 0.005211 12.01 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4665 on 34815 degrees of freedom
Multiple R-squared: 0.04082, Adjusted R-squared: 0.04071
F-statistic: 370.4 on 4 and 34815 DF, p-value: < 2.2e-16
Response ilr(VOTE)3 :
Call:
lmCoDa(formula = `ilr(VOTE)3` ~ log(POP_DENSITY) + ilr(PROFCAT),
data = mun2model)
Residuals:
Min 1Q Median 3Q Max
-2.29032 -0.18463 0.01998 0.21421 2.18447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.747150 0.007187 -103.956 <2e-16 ***
log(POP_DENSITY) -0.003843 0.001640 -2.343 0.0191 *
ilr(PROFCAT)1 -0.120232 0.003794 -31.692 <2e-16 ***
ilr(PROFCAT)2 0.061290 0.004078 15.029 <2e-16 ***
ilr(PROFCAT)3 -0.007321 0.004251 -1.722 0.0851 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3806 on 34815 degrees of freedom
Multiple R-squared: 0.03076, Adjusted R-squared: 0.03065
F-statistic: 276.3 on 4 and 34815 DF, p-value: < 2.2e-16
Response ilr(VOTE)4 :
Call:
lmCoDa(formula = `ilr(VOTE)4` ~ log(POP_DENSITY) + ilr(PROFCAT),
data = mun2model)
Residuals:
Min 1Q Median 3Q Max
-2.40942 -0.20221 -0.00447 0.19254 2.09011
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.474646 0.006319 75.110 <2e-16 ***
log(POP_DENSITY) -0.098307 0.001442 -68.158 <2e-16 ***
ilr(PROFCAT)1 -0.032753 0.003336 -9.819 <2e-16 ***
ilr(PROFCAT)2 0.055440 0.003586 15.462 <2e-16 ***
ilr(PROFCAT)3 0.064590 0.003738 17.279 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3346 on 34815 degrees of freedom
Multiple R-squared: 0.1668, Adjusted R-squared: 0.1667
F-statistic: 1743 on 4 and 34815 DF, p-value: < 2.2e-16
Response ilr(VOTE)5 :
Call:
lmCoDa(formula = `ilr(VOTE)5` ~ log(POP_DENSITY) + ilr(PROFCAT),
data = mun2model)
Residuals:
Min 1Q Median 3Q Max
-2.8703 -0.1740 -0.0021 0.1682 5.8459
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.242161 0.005825 41.574 <2e-16 ***
log(POP_DENSITY) 0.039056 0.001329 29.377 <2e-16 ***
ilr(PROFCAT)1 0.112449 0.003075 36.573 <2e-16 ***
ilr(PROFCAT)2 0.031237 0.003305 9.451 <2e-16 ***
ilr(PROFCAT)3 0.047434 0.003446 13.767 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3085 on 34815 degrees of freedom
Multiple R-squared: 0.0626, Adjusted R-squared: 0.0625
F-statistic: 581.3 on 4 and 34815 DF, p-value: < 2.2e-16
Code
aov <-anova(VOTE_model)aov$`approx F`<-round(aov$`approx F`)aov$Pillai <-round(aov$Pillai, 3)kable(aov, caption ="Analysis of variance table for the y-compositional model") |>kable_classic()
Analysis of variance table for the y-compositional model
Df
Pillai
approx F
num Df
den Df
Pr(>F)
(Intercept)
1
0.847
38609
5
34811
0
log(POP_DENSITY)
1
0.195
1688
5
34811
0
ilr(PROFCAT)
3
0.184
455
15
104439
0
Residuals
34815
Code
kable(aov, caption ="Analysis of variance table for the y-compositional model", label ="aov_VOTE", format ="latex", booktabs =TRUE) |>kable_styling(latex_options ="hold_position") |>save_kable(file =here("out/tables/aov_VOTE.tex"))
6.1 Interpreting the impact of a scalar X
6.1.1 Semi-elasticities differences
The clr coefficients correspond to a difference from the mean elasticity.
Code
flevels <-c("-Mean semi elasticity", rev(rename_elec))clr_confint <-confint(VOTE_model, "log(POP_DENSITY)")clr_confint$Y <-factor(clr_confint$Y, flevels)clr_confintB <- clr_confint[1,]clr_confintB$Y <-factor("-Mean semi elasticity", levels = flevels)clr_confintB[,3:6] <-NAclr_T <-coef(VOTE_model, "clr")ME <-as(fitted(VOTE_model, "simplex"),"matrix") %*%t(clr_T[-1,])ME <-melt(data.table(ME), variable.name ="X", value.name ="EST")ME[,Y :=factor("-Mean semi elasticity", levels = flevels)]ggplot(rbind(clr_confint,clr_confintB)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +geom_point(aes(x=Y, y = EST, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +geom_violin(aes(y = EST, x = Y),data = ME[X=="log(POP_DENSITY)",]) +labs(y ="clr parameter values & confidence intervals\ndistribution of negative mean semi elasticity", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "clr_b_popdens_with_details.pdf"), height =2.25, width =7)
We can use them to construct differences between the elasticities of two components of the dependent shares with respect to the same explanatory variable.
Code
clr_confint <-confint(VOTE_model, "log(POP_DENSITY)",y_ref =1)clr_confint$Y <-factor(clr_confint$Y, flevels)clr_confintB <- clr_confint[1,]clr_confintB[,3:6] <-NAggplot(rbind(clr_confint,clr_confintB)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +geom_point(aes(x=Y, y = DIFF, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +labs(y ="semi-elasticity differences & confidence intervals\n(in the denominator is Macron)", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "SE_diff_popdens_with_details.pdf"), height =2, width =7)
6.1.2 Variation table
We can use Taylor approximations to evaluate how the voting behavior reacts to a small change in the number of registered voters. Here, we use increment log(POP_DENSITY) by \(0.1\), which means that population density increases by about 10%.
Code
paris_num <-which(mun2model$ID_MUN == paris_id)paris_varTab <-VariationTable(VOTE_model, "log(POP_DENSITY)",inc_size = .1, obs = paris_num, Ytotal = mun2model$REGISTERED[paris_num])format_VarTab(paris_varTab, 4, 0, 2, 2) |>kable(caption ="Increase of the population density by 10%", digits =4, booktabs =TRUE)
Increase of the population density by 10%
Macron
Le Pen
Mélenchon
Left
Right
No vote
Initial parts
0.2500
0.1582
0.1729
0.0828
0.0880
0.2481
New parts
0.2505
0.1575
0.1734
0.0828
0.0871
0.2487
Semi elasticity
0.0198
-0.0411
0.0313
-0.0011
-0.1077
0.0230
Variation in %
0.2 %
-0.41 %
0.31 %
-0.01 %
-1.08 %
0.23 %
Variation in % points
0.05 %
-0.06 %
0.05 %
0 %
-0.09 %
0.06 %
Variation in units
678
-889
739
-13
-1297
781
Code
format_VarTab(paris_varTab, 4, 0, 2, 2) |>kable(caption ="Increase of the population density by 10\\%", digits =4, booktabs =TRUE,format ="latex", label ="VarTab_VOTE_logPDENS_change", linesep ="") |>kable_styling(latex_options ="hold_position") |>save_kable(here("out/tables/VarTab_VOTE_logPDENS_change.tex"))
It is easy to verify that the variations in % points and units compensate each other.
Code
t(t(round(rowSums(paris_varTab),10)))
[,1]
Initial parts 1.0000000000
New parts 1.0000000000
Semi elasticity -0.0757855744
Variation in % -0.7578557439
Variation in % points 0.0000000000
Variation in units -0.0000000001
6.1.3 Variation scenarios
In this scenario, the population density changes on a grid that is regular in logs. Even extreme changes such as a tenfold population increase or reduction to one-tenth would lead to little variation. When the population of Paris shrinks to \(0.1\%\) of its original size, we see considerable differences.
6.2 Interpreting the impact of an X-compositional variable
6.2.1 Elasticities difference
For X-compositional variables, the clr coefficients also correspond to differences in mean elasticities, where these means are computed for each component of \(X\) over all dependent shares.
Code
flevels2 <-c("-Mean elasticity",flevels)clr_confint <-confint(VOTE_model, "PROFCAT")clr_confint$X <-factor(clr_confint$X, colnames(PROFCAT))clr_confint$Y <-factor(clr_confint$Y, levels = flevels2)clr_confint_all <- clr_confintclr_confint_all$Y <-factor("-Mean elasticity", flevels2)clr_confint_all$SD <- clr_confint_all$ESTclr_confint_all$EST <-NAclr_confint_all$`2.5 %`<-NAclr_confint_all$`97.5 %`<-NAclr_confint_all <-unique(clr_confint_all)levels(ME$Y)[1] <-"-Mean elasticity"ggplot(rbind(clr_confint,clr_confint_all)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +geom_point(aes(x=Y, y = EST, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +geom_violin(aes(y = EST, x = Y),data = ME[X!="log(POP_DENSITY)"]) +facet_wrap("X") +labs(y ="parameter values & confidence intervals\ndistribution of negative mean elasticity", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "clr_B_profcat_with_details.pdf"), height =4, width =7)
Code
clr_confint <-confint(VOTE_model, "PROFCAT",y_ref =1)clr_confint$Y <-factor(clr_confint$Y,levels = flevels)clr_confint$X <-factor(clr_confint$X,levels =colnames(PROFCAT))ggplot(clr_confint) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +geom_point(aes(x=Y, y = DIFF, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +facet_wrap("X") +labs(y ="elasticity differences & confidence intervals\n(in the denominator is Macron)", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "E_diff_profcat_with_details.pdf"), height =3.5, width =7)
6.2.2 Variation table
Similar to the previous variation table, we evaluate how the voting behavior reacts to a slight change in the explanatory composition. However, since this variable is multivariate, we must specify the change direction. For a change towards the vertex “Upper” we get the following:
Code
VariationTable4PCparis <-function(Xdir) VariationTable(object = VOTE_model,Xvar ="PROFCAT",Xdir = Xdir,inc_size = .1,obs = paris_num,normalize_Xdir =FALSE,Ytotal = mun2model$REGISTERED[paris_num])paris_varTab <-VariationTable4PCparis(Xdir ="Upper")tab_atr <-attributes(paris_varTab)paris_varTab <-format_VarTab(paris_varTab, 3, 0,digitsP =2,digitsPP =2)paris_varTab |>kable(caption ="Change of (PC) in direction of the vertex \"Upper\"", digits =4, booktabs =TRUE) |>add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2)[1]))
Change of (PC) in direction of the vertex “Upper”
Macron
Le Pen
Mélenchon
Left
Right
No vote
Initial parts
0.250
0.158
0.173
0.083
0.088
0.248
New parts
0.251
0.157
0.174
0.084
0.088
0.246
Elasticity
0.060
-0.074
0.059
0.087
0.013
-0.087
Variation in %
0.6 %
-0.74 %
0.59 %
0.87 %
0.12 %
-0.87 %
Variation in % points
0.15 %
-0.12 %
0.1 %
0.07 %
0.01 %
-0.22 %
Variation in units
2038
-1601
1395
981
151
-2964
Note:ah = 0.1, alpha=5.58%
For the general direction, we get the following:
Code
paris_varTab <-VariationTable4PCparis(dir_gen)paris_varTab <-format_VarTab(paris_varTab, 3, 0,digitsP =2,digitsPP =2)paris_varTab |>kable(caption ="Change of PC in a general direction", digits =4, booktabs =TRUE)
Change of PC in a general direction
Macron
Le Pen
Mélenchon
Left
Right
No vote
Initial parts
0.250
0.158
0.173
0.083
0.088
0.248
New parts
0.248
0.162
0.171
0.082
0.087
0.250
Elasticity
-0.072
0.246
-0.136
-0.136
-0.088
0.087
Variation in %
-0.72 %
2.46 %
-1.36 %
-1.36 %
-0.88 %
0.87 %
Variation in % points
-0.18 %
0.39 %
-0.23 %
-0.11 %
-0.08 %
0.22 %
Variation in units
-2459
5325
-3215
-1537
-1056
2941
6.2.3 Variation scenarios
Let us turn to the changes induced by the variation of the socio-professional composition in Paris. Here, we show the changes in the explanatory composition on top of the changes in the predicted shares. The graphic below displays the changes as a function of the category “Upper”.
Code
paris_pred1$VOTE <-as(predict(VOTE_model, space ="simplex", newdata = paris_pred1),"matrix")paris_pred2$VOTE <-as(predict(VOTE_model, space ="simplex", newdata = paris_pred2),"matrix")unmake_pc_vote_names <-c(structure(paste0("(Vote) ", unmake_votenames), names =names(unmake_votenames)),structure(paste0("(PC) ", colnames(PROFCAT)), names =paste0("PROFCAT.", colnames(PROFCAT))))legend_title <-"candidate or block\n(top six)\n\nprofessional category\n(bottom four)"paris_pred1_gg <-melt(data =data.table(paris_pred1),id.vars =c("H_SEQ","F_SHARE","PRED_TURNOUT"),measure.vars =names(unmake_pc_vote_names))paris_pred2_gg <-melt(data =data.table(paris_pred2),id.vars =c("H_SEQ","F_SHARE","PRED_TURNOUT"),measure.vars =names(unmake_pc_vote_names))paris_pred_gg <-rbind(cbind("DIR"="Direction to vertex", paris_pred1_gg),cbind("DIR"="General direction", paris_pred2_gg))paris_pred_gg[,COMPO :=as.character(variable)]paris_pred_gg[,SAHRE_XY :=factor(unmake_pc_vote_names[COMPO], unmake_pc_vote_names)]paris_pred_gg[,COMPO :=fcase(grepl("^VOTE.", variable), "Changes in Y",grepl("^PROFCAT.", variable), "Changes in X")]paris_pred_gg[,COMPO :=factor(COMPO, levels =c("Changes in Y", "Changes in X"))]hline <-data.frame(INT =c(paris$PROFCAT[c(1,1), "Upper"],.25),DIR =c("Direction to vertex","General direction", "General direction"))pclim <-c(.05, .95)ggplot(paris_pred_gg) +geom_vline(aes(xintercept = INT), data = hline[1:2,]) +geom_line(aes(y = value, x = F_SHARE, col =SAHRE_XY, lty = SAHRE_XY)) +scale_y_continuous(breaks =seq(1,9, by =2)/10, labels = scales::percent) +scale_x_continuous(breaks =seq(1,9, by =2)/10, labels = scales::percent) +theme_bw() +labs(y ="shares of\nprofessional categories (bottom) and predicted votes (top)",x ="share of professional category Upper",col = legend_title, lty = legend_title) +facet_grid(rows =vars(COMPO), cols =vars(DIR), shrink =TRUE, scales ="free_y")
We could also illustrate this scenario in a matrix of ternary diagrams. However, in this representation, we are limited to subcompositions of three components.
Code
opar <-par(mfrow =c(2,2), mai =c(.1,0,.2,0))filter1 <- paris_pred1$H_SEQ[between(paris_pred1$F_SHARE, pclim[1], pclim[2])]filter1 <-as.numeric(filter1)filter1 <-min(max(-filter1), max(filter1))filter1 <-intersect(paris_pred1$H_SEQ, seq(-filter1,filter1))filter1 <-as.character(seq(-25,25))colors1 <-diverging_hcl(length(filter1),palette ="Blue-Red-3")plot(acomp(paris_pred1$PROFCAT[filter1,1:3]), col = colors1, pch =19)title("Change X (direction to vertex)")plot(acomp(paris_pred2$PROFCAT[filter1,1:3]), col = colors1, pch =19)title("Change X (general direction)")plot(acomp(paris_pred1$VOTE[filter1,1:3]), col = colors1, pch =19)plot(acomp(paris_pred2$VOTE[filter1,1:3]), col = colors1, pch =19)par(opar)
6.2.4 Direction of maximal share ratio variation
For each dependent share ratio, the direction of maximal variation is given by the squared norm of the difference vector for two rows of the parameter matrix \(B\), where the row indexes correspond to the indexes of the two components that appear in the share ratio. With this in mind we can examine how the share ratio react to a change in their respective maximal variation direction.
The above shows that the professional categories is most useful to discriminate Le Pens share from that of the others. Let us now look at the variation scenario in the direction of maximal variation for the shares of the “Left” and “Le Pen”, which is the strongest opposition in our example. Additionally, we look at “No Vote” vs “Le Pen”, which is the weakest opposition that invloves “Le Pen” and “Right vs Mélenchon”, which is the overall weakest opposition.
Code
Mdiff_VS <-function(den ="Left", num ="Le Pen") { max_diff_dir <-exp(B[num,] - B[den,]) max_diff_dir <- max_diff_dir/sum(max_diff_dir) max_diff_scenario <-VariationScenario(VOTE_model,Xvar ="PROFCAT", Xdir = max_diff_dir) max_diff_scenario$H_SEQ <-rownames(max_diff_scenario) max_diff_scenario$VOTE <- max_diff_scenario$Y max_diff_scenario$PROFCAT <- max_diff_scenario$X max_diff_scenario$SR <- max_diff_scenario$Y[,num] /max_diff_scenario$Y[,den] unmake_names <-paste0(num, " / ", den) unmake_names <-"(SR) Share ratio" unmake_names <-c("SR"= unmake_names, unmake_pc_vote_names) max_diff_scenario_gg <-melt(data =data.table(max_diff_scenario[,c("H_SEQ","VOTE","PROFCAT","SR")]),id.vars =c("H_SEQ"),measure.vars =names(unmake_names)) max_diff_scenario_gg <-cbind(max_diff_scenario_gg, "DIR"=paste0(num, " vs. ", den)) max_diff_scenario_gg[,COMPO :=as.character(variable)] max_diff_scenario_gg[,SAHRE_XY :=factor(unmake_names[COMPO], unmake_names)] max_diff_scenario_gg[,COMPO :=fcase( variable =="SR", "Share ratio",grepl("^VOTE.", variable), "Changes in Y",grepl("^PROFCAT.", variable), "Changes in X")] max_diff_scenario_gg[,COMPO :=factor(COMPO, levels =c("Share ratio", "Changes in Y", "Changes in X"))] max_diff_scenario_gg}max_diff_scenario_gg <-rbind(Mdiff_VS(den ="Left", num ="Le Pen"),Mdiff_VS(den ="No vote", num ="Le Pen"),Mdiff_VS(num ="Mélenchon", den ="Right"))legend_title3 <-"share ratio (SR)\n\ncandidate or block (Vote)\n\nprofessional category (PC)"ggplot(max_diff_scenario_gg) +geom_vline(aes(xintercept =0)) +geom_line(aes(y = value, x =as.integer(H_SEQ)/10, col =SAHRE_XY, lty = SAHRE_XY)) +scale_colour_discrete_qualitative() +theme_bw() +labs(y ="X-shares (bottom), Y-shares (middle), Y-share ratio (top)",x ="value of h",col = legend_title3, lty = legend_title3) +facet_grid(rows =vars(COMPO), cols =vars(DIR), scales ="free") +theme(legend.key.width =unit(3, "line"))
The direction of changes in X can be easily computed from the parameter matrix:
Code
clrInv2 <-function(x) exp(x)/sum(exp(x))uStar <-function(u) round(clrInv2(u/sqrt(sum(u^2))),3)rbind(data.frame("Y-ratio"="Le Pen vs Left", t(uStar(B["Le Pen", ] - B["Left",]))),data.frame("Y-ratio"="Le Pen vs No Vote", t(uStar(B["Le Pen", ] - B["No vote",]))),data.frame("Y-ratio"="Mélenchon vs Right", t(uStar(B["Mélenchon",] - B["Right", ]))))
Y.ratio
Upper
Workers
Retired
Other
Le Pen vs Left
0.144
0.499
0.151
0.206
Le Pen vs No Vote
0.231
0.482
0.133
0.153
Mélenchon vs Right
0.510
0.149
0.181
0.160
6.2.5 Share ratio elasticity tables
Another way to interpret covariate impacts in XY-compositional models is to consider share ratio elasticities. These measure the % change in the ratio of two components of Y relative to the % change in the ratio of two components of X.
The four tabs below illustrate the share ratio elasticities for the case where the direction is fixed towards one of the vertices of the PC composition. The blank rows in the graphic are because some share ratios do not change for this particular direction.
While the previous charts used the same direction for all share ratio elasticities, we can also change them as a function of the components used in the share ratio of X. The compensating direction is particularly interesting for this purpose because the changes in the Y composition are entirely explained by changes in the two compensating components of the X composition.
The values of the share-ratio elasticities shown above can be understood as a decomposition of the original clr parameters. More precisely, it is possible to show that, within any of the 24 blocks summing over all 15 elasticity values, we obtain \(0.5 D_y D_x B^*\), a matrix proportional to the clr parameter values.
Code
B <-coef(VOTE_model,"clr",split =TRUE)[["ilr(PROFCAT)"]]B_df <-data.table(B,keep.rownames =TRUE)B_df <-melt(B_df,id.vars ="rn")B_df$Y <-factor(B_df$variable,levels =colnames(B))B_df$X <-factor(B_df$rn,levels =rev(rownames(B)))setDT(B_df)setDT(sre_tab)B_df2 <- sre_tab[,list(value2 =sum(SRE) /ncol(B) /nrow(B) *2), by =c("Yl", "Xl")]B_df2[B_df, value := i.value, on =c("Yl"="Y", "Xl"="X")]B_df2
Yl
Xl
value2
value
Macron
Upper
0.0501914
0.0501914
Le Pen
Upper
-0.0833846
-0.0833846
Mélenchon
Upper
0.0495960
0.0495960
Left
Upper
0.0771850
0.0771850
Right
Upper
0.0031394
0.0031394
No vote
Upper
-0.0967272
-0.0967272
Macron
Workers
-0.0393279
-0.0393279
Le Pen
Workers
0.2346584
0.2346584
Mélenchon
Workers
-0.0877041
-0.0877041
Left
Workers
-0.0887457
-0.0887457
Right
Workers
-0.0673250
-0.0673250
No vote
Workers
0.0484443
0.0484443
Macron
Retired
0.0238771
0.0238771
Le Pen
Retired
-0.1054013
-0.1054013
Mélenchon
Retired
0.0120279
0.0120279
Left
Retired
0.0370594
0.0370594
Right
Retired
0.0216541
0.0216541
No vote
Retired
0.0107828
0.0107828
Macron
Other
-0.0347406
-0.0347406
Le Pen
Other
-0.0458725
-0.0458725
Mélenchon
Other
0.0260802
0.0260802
Left
Other
-0.0254987
-0.0254987
Right
Other
0.0425316
0.0425316
No vote
Other
0.0375001
0.0375001
The visual impression blow of parameter matrix \(B\), confirms the link to the more detailed share ratio representation above.
The example used to illustrate this work is taken from a Statistical Consulting project class of the Master in Data Science for Social Sciences of The Toulouse School of Economics, in cooperation with the Market Research agency BVA. The project was a great experience, and we want to thank all the participants, namely Olivier Hennebelle and Alejandro Lara, who advised the project from the side of BVA, and our four students, Claire Lebrun, Malo Bert, Kyllian James and Gael Charrier.
Source Code
---title: "Modeling the French presidential elections of 2022 with CoDa tools"author: - Lukas Dargel - Christine Thomas-Agnandate: "2024-03-08"date-modified: "last-modified"format: html: theme: minty toc: true number-sections: true code-fold: true code-tools: true df-print: kable embed-resources: true standalone: true self-contained: trueexecute: cache: true fig-show: hold error: true---```{r}#| include: falsedir.create(here::here("out","figures"),showWarnings =FALSE,recursive =TRUE)dir.create(here::here("out","tables"),showWarnings =FALSE,recursive =TRUE)```This vignette accompanies our article:> Dargel, Lukas, and Christine Thomas-Agnan (2024).> *Pairwise share ratio interpretations of compositional regression models*.> Computational Statistics & Data Analysis, Volume 195, July 2024, 107945> Accessed `r format(Sys.Date(), "%d %b %Y")`.> Online at <https://doi.org/10.1016/j.csda.2024.107945>The vignette contains all the code necessary to reproduce the article's results and provides additional illustrations.```{r setup}#| warning: false#| message: false# cran dependencieslibrary("colorspace")library("CoDaImpact")library("compositions")library("data.table")library("here")library("ggplot2")library("kableExtra")library("skimr")library("stringr")library("zCompositions")# local functionssave_baseplot <-function(device, ..., which){suppressMessages({dev.copy(device, ..., which)dev.off()invisible(TRUE) })}# optionsoptions(knitr.kable.NA ='',scipen =9,warn =1)# datamun_elec2census <-readRDS(here("out/data/mun_elec2census.Rds")) # election data combined with the censusmun_elec <-readRDS(here("in/data/mun_elec.Rds")) # original election data ```# IntroductionWe illustrate the use of compositional data analysis (CoDa) to analyze the results of political elections using the example of the French presidential elections of 2022.Our primary focus is the illustration of mathematical tools that enable new ways to interpret CoDa regression models.Therefore, other issues, such as variable selection and zeros handling, are treated more briefly.# Descriptive statisticsThe study combines the official election results with census data:- [The original census data is available on the INSEE website](https://www.insee.fr/fr/statistiques/6543200#consulter)- [The French government provides the official election results via this link](https://www.data.gouv.fr/fr/datasets/election-presidentielle-des-10-et-24-avril-2022-resultats-definitifs-du-1er-tour/)## Data overviewLet us have a first look at the combined data set.```{r}skim(mun_elec2census)```## Compare losses due to preprocessingWe lose some of the registered voters in France because the census data only covers individuals living in mainland France.Our analysis does not cover the overseas territories and voters living in a foreign country.Let us compare the original and combined voting data to assess the losses and potential biases.```{r}rename_elec <-c(MACRON_Emmanuel ="Macron",LE_PEN_Marine ="Le Pen",MELENCHON_Jean_Luc ="Mélenchon",LEFT ="Left",RIGHT ="Right",NON_VOTE ="No vote",ARTHAUD_Nathalie ="Arthaud",ROUSSEL_Fabien ="Roussel",LASSALLE_Jean ="Lasalle",ZEMMOUR_Eric ="Zemmour",HIDALGO_Anne ="Hidalgo",JADOT_Yannick ="Jadot",PECRESSE_Valerie ="Pécresse",POUTOU_Philippe ="Poutou",DUPONT_AIGNAN_Nicolas ="Dupont-Aignan",ABSTENTIONS ="Abstention",BLANK ="Blank",INVALID ="Invalid",EXPRESSED ="Expressed",REGISTERED ="Registered")elec_cols <-setdiff(names(mun_elec), c("ID_MUN", "NAME_MUN", "NAME_DEP"))pct <-function(x) scales::percent(x,accuracy = .1)summarize_elec <-function(dt) { dt <-data.table(Municipalities =nrow(dt),Departements =length(unique(dt$NAME_DEP)), dt[,lapply(.SD, sum), .SDcols = elec_cols] ) dt[,EXPRESSED :=NA] dt[,REGISTERED :=sum(.SD), .SDcols = elec_cols] dt[,EXPRESSED := REGISTERED - ABSTENTIONS - BLANK - INVALID] show_dt <-data.table(t(dt),keep.rownames =TRUE) dont_use <-c("Municipalities","Departements") show_dt[!rn %in% dont_use,"% of REGISTERED":=pct(V1/dt[["REGISTERED"]])] dont_use <-c(dont_use, "ABSTENTIONS", "BLANK", "INVALID", "REGISTERED") show_dt[!rn %in% dont_use,"% of EXPRESSED":=pct(V1/dt[["EXPRESSED"]])]names(show_dt)[1:2] <-c("Indicator", "Count") show_dt} elec_original <-summarize_elec(mun_elec)elec_combined <-summarize_elec(mun_elec2census)elec_combined <-cbind(Loss = elec_original[["Count"]] - elec_combined[["Count"]], elec_combined[,-1])elec_combined <-cbind("% Loss"=pct(elec_combined[["Loss"]]/elec_combined[["Count"]]), elec_combined)elec_original[Indicator %in%names(rename_elec),Indicator := rename_elec[Indicator]]kable(cbind(elec_original, elec_combined)) |>kable_classic(full_width = F) |>add_header_above(c(" "=1, "Full election data"=3, "Combined election data"=5)) |>pack_rows(group_label ="Units", 1,2,hline_after =TRUE) |>pack_rows(group_label ="Not expressed", 3,5,hline_after =TRUE) |>pack_rows(group_label ="Expressed", 6,17,hline_after =TRUE) |>pack_rows(group_label ="Totals", 18,19,hline_after =TRUE) |>column_spec(1, bold = T) ```## Check for problems with zerosBecause CoDa models cannot effectively handle cases where dependent or independent compositional variables have zero proportions, it is necessary to address the zeros issue before estimating the models.To address this, we will initially employ amalgamations and impute the remaining zero values by a small constant.The amalgamation process combines various professional categories or political candidates into larger blocks.However, it is important to note that our proposed grouping is ad hoc, and a case study of electoral sociology should certainly place more thought into these actions.### In the vote dataWe first check for zeros in the dependent variables.```{r}zero_summary <-function(dt) dt[,lapply(.SD, function(x) pct(sum(x==0)/.N))]t(zero_summary(mun_elec2census[,..elec_cols]))```To reduce the number of zero votes, we group the minor candidates into a "left" and a "right" block, each containing four candidates.Additionally, we group invalid votes, blank votes and abstentions into the "No vote" block.After these amalgamations, we obtain a visual impression of the zero patterns in the new composition using the `zCompositions::zPatterns()` function.The graph reveals that almost $98.5\%$ of the municipalities have non-zero compositions now.```{r}left_bloc <-c("ARTHAUD_Nathalie", "ROUSSEL_Fabien", "HIDALGO_Anne", "JADOT_Yannick", "POUTOU_Philippe")right_bloc <-c("ZEMMOUR_Eric", "PECRESSE_Valerie", "LASSALLE_Jean", "DUPONT_AIGNAN_Nicolas")VOTE <-cbind( mun_elec2census[,c("MACRON_Emmanuel", "LE_PEN_Marine", "MELENCHON_Jean_Luc")],LEFT =as.integer(rowSums(mun_elec2census[,..left_bloc])),RIGHT =as.integer(rowSums(mun_elec2census[,..right_bloc])),NON_VOTE =as.integer(rowSums(mun_elec2census[,c("ABSTENTIONS", "INVALID","BLANK"),])))colnames(VOTE) <- rename_elec[colnames(VOTE)]vote_names <- rename_elec[colnames(VOTE)]zPatterns( VOTE,bar.labels =TRUE,label="0")````r zv <- sum(0 != rowSums(VOTE == 0))` The remaining zeros are imputed by simply adding one vote to all candidates (or blocks) in each municipality with at least one zero.This imputation affected `r zv` municipalities, adding an overall number of `r zv*ncol(VOTE)` fictive voters.```{r}is_zero_row <-0!=rowSums(VOTE ==0) VOTE_IMP <- VOTE + is_zero_rowcbind(sum(is_zero_row),sum(is_zero_row) *ncol(VOTE),sum(VOTE_IMP) -sum(VOTE))```#### Before and after imputationWe compare the data distribution in the original and imputed data sets to ensure that the imputation does not distort our analysis.Comparing the two distributions below reveals no noticeable changes in the mean but a slight increase in the variance.The effect on the variance is not surprising since our imputation method replaces zeros (red bars in the graphic) with points close to the edge of the triangle, which are extreme values in the simplex sense.```{r}simplex_isodensity <-function( data,quantiles =c(0.5,1:9,9.5)/10,labels =names(cen),col1 ="black",col2 ="grey45",plot_data =TRUE) {# code adapted from:# Van den Boogaart and Tolosana-Delgado (2013), page 52stopifnot(ncol(data) ==3) cen <-mean(data) var <-var(data)if (plot_data) plot(data, pch =".", col ="grey85", labels = labels, mp =NULL, lenMissingTck =0.02)plot(cen,pch=4, col = col1, labels = labels, add = plot_data)for (p in quantiles) { r =sqrt(qchisq(p=p,df=2))ellipses(cen,var,r, col=col2) }}``````{r}#| fig-show: "hold"opar <-par(mar=c(0,3,0,3), mfrow =c(1,2))simplex_isodensity(acomp(VOTE_IMP[,1:3]))title(sub ="Imputed", outer =FALSE)simplex_isodensity(acomp(VOTE[,1:3]))title(sub ="Original")par(opar)``````{r}#| fig-show: "hold"opar <-par(mar=c(0,3,0,3), mfrow =c(1,2))simplex_isodensity(acomp(VOTE_IMP[,-(1:3)]))title(sub ="Imputed", outer =FALSE)simplex_isodensity(acomp(VOTE[,-(1:3)]))title(sub ="Original")par(opar)```### In the professional categories (PC) dataThe professional categories data directly comes from the French population census provided by the INSEE.It categorizes the population above 15 into one of eight groups.The categories correspond to- `C19_POP15P_CS1`: Agriculteurs exploitants- `C19_POP15P_CS2`: Artisans, Commerçants, Chefs d'entreprise- `C19_POP15P_CS3`: Cadres et Professions intellectuelles supérieures- `C19_POP15P_CS4`: Professions intermédiaires- `C19_POP15P_CS5`: Employés- `C19_POP15P_CS6`: Ouvriers- `C19_POP15P_CS7`: Retraités- `C19_POP15P_CS8`: Autres sans activité professionnelle```{r}pc_cols <-names(mun_elec2census)pc_cols <- pc_cols[grep("^C19_", pc_cols)]t(zero_summary(mun_elec2census[,..pc_cols]))```Since the number of municipalities having zero share of some of these categories is too large, we amalgamate the eight original groups into four broader categories.Using the new categorization, about $95\%$ of the municipalities have no zero value in all components.```{r}PROFCAT <-cbind("Upper"= mun_elec2census[,C19_POP15P_CS3 + C19_POP15P_CS4],"Workers"= mun_elec2census[,C19_POP15P_CS5 + C19_POP15P_CS6],"Retired"= mun_elec2census[,C19_POP15P_CS7],"Other"= mun_elec2census[,C19_POP15P_CS1 + C19_POP15P_CS2 + C19_POP15P_CS8])zPatterns( PROFCAT,label="0",bar.labels =TRUE,axis.labels =c(NA,NA))````r zv <- sum(0 != rowSums(PROFCAT == 0))` Using the previous strategy, we add one individual to all PCs in each municipality with at least one zero component.Overall, this imputation affected `r zv` municipalities, adding a total of `r zv*ncol(PROFCAT)` fictive individuals.```{r}is_zero_row <-0!=rowSums(PROFCAT ==0) PROFCAT_IMP <- PROFCAT + is_zero_rowcbind(sum(is_zero_row),sum(is_zero_row) *ncol(PROFCAT),sum(PROFCAT_IMP) -sum(PROFCAT))```#### Before and after imputationWe proceed to compare the distribution of the PC compositions before and after our imputation process.In the graphics below, the mean seems unaffected, while the variance grows more substantially than in the case of the VOTE composition.```{r}#| fig-show: "hold"opar <-par(mar=c(0,4,0,4), mfrow =c(1,2))simplex_isodensity(acomp(PROFCAT_IMP[,(1:3)]))title(sub ="Imputed", outer =FALSE)simplex_isodensity(acomp(PROFCAT[,(1:3)]))title(sub ="Original")par(opar)``````{r}#| fig-show: "hold"opar <-par(mar=c(0,3,0,3), mfrow =c(1,2))simplex_isodensity(acomp(PROFCAT_IMP[,(2:4)]))title(sub ="Imputed", outer =FALSE)simplex_isodensity(acomp(PROFCAT[,(2:4)]))title(sub ="Original")par(opar)```# Exploratory analysisFor the analysis, we proceed with the imputed VOTE and PC data.```{r}mun2explore <-data.table(ID_MUN = mun_elec2census$ID_MUN,REGISTERED =rowSums(VOTE_IMP))mun2explore <-cbind(mun2explore, VOTE_IMP, PROFCAT_IMP)```## Range of Values```{r}logit <-function(x) log(x/(1-x))ggplot(mun2explore[REGISTERED <300000,],) +geom_point(aes(y =log(1-`No vote`/REGISTERED), x =logit(1-`No vote`/REGISTERED))) +theme_bw() +theme()```## HeteroskedasticitySince election data corresponds to aggregations of individual choices, we should suspect heteroskedasticity.The scatter plot below shows that the number of voters directly impacts the turnout rate.Additionally, its volatility decreases with the size of the municipality in terms of the number of voters.```{r}#| warning: false#| message: falseggplot(mun2explore[REGISTERED <300000,], aes(x =sqrt(REGISTERED), y =1-`No vote`/REGISTERED)) +geom_point() +geom_smooth() +theme_bw() +theme()```The graphic below reveals the same pattern for all components of the VOTE data.```{r}#| fig-show: holdseq125 <-c(0, unlist(lapply(10^seq(10), "*", c(1,2,5))))seqSqr <-seq(0,15)^2*1000namesSqr <-c(sprintf("..., %s]", seqSqr[-1]), sprintf("(%s, ...", seqSqr[length(seqSqr)]))mun2explore[,REGISTERED_BINS :=cut(REGISTERED,c(seqSqr,Inf),labels = namesSqr)]mun2explore_long <-melt(mun2explore,id.vars =c("ID_MUN","REGISTERED_BINS"),measure.vars =colnames(VOTE))mun2explore_long[,value_rel := value/sum(value), by ="ID_MUN"]mun2explore_long[,variable:=factor(variable, levels = rename_elec)]ggplot(mun2explore_long, aes(x = REGISTERED_BINS, y = value_rel)) +geom_boxplot(outlier.size = .1) +facet_wrap("variable", ncol =3) +scale_y_continuous(name ="Vote share",labels = scales::percent) +labs(x ="Number of registered voters") +theme_bw() +theme(axis.text.x =element_text(angle =90,hjust =1,vjust = .5)) ggsave(here("out/figures", "vote6_boxplots.pdf"), width =7, height =6)```For the professional categories, we observe again the same pattern.```{r}#| fig-show: holdmun2explore_long2 <-melt(mun2explore,id.vars =c("ID_MUN","REGISTERED_BINS"),measure.vars =colnames(PROFCAT))mun2explore_long2[,value_rel := value/sum(value), by ="ID_MUN"]ggplot(mun2explore_long2, aes(x = REGISTERED_BINS, y = value_rel)) +geom_boxplot(outlier.size = .1) +facet_wrap("variable", ncol =2) +scale_y_continuous(name ="Vote share",labels = scales::percent) +labs(x ="Number of registered voters") +theme_bw() +theme(axis.text.x =element_text(angle =90,hjust =1,vjust = .5))ggsave(here("out/figures", "profcat4_boxplots.pdf"), width =7, height =6)``````{r}# After the imputations, we decide on the data for modelingmun2model <-data.frame(ID_MUN = mun_elec2census$ID_MUN,REGISTERED =rowSums(as.matrix(VOTE_IMP)),REGISTERED_G =Reduce("*", lapply(VOTE_IMP, as.numeric)),POP_DENSITY = mun_elec2census$POPULATION / mun_elec2census$AREA)mun2model[["VOTE"]] <-as.matrix(VOTE_IMP)/mun2model$REGISTEREDmun2model[["PROFCAT"]] <-as.matrix(PROFCAT_IMP)/rowSums(PROFCAT_IMP)```# Understanding "linear" changes in compositional variablesIn this section, we illustrate how compositional variables change based on the PC composition in Paris.Below, we have the observed composition for the Paris municipality.```{r}paris_id <-"75056"paris <- mun2model[mun2model$ID_MUN == paris_id,,drop =FALSE]paris$PROFCAT```In the following subsections, take the composition of Paris as a point of departure and add a sequence of linear increments, where linearity refers to the simplex geometry.It should be evident that a linear increment of any multivariate vector is defined by a direction and a step size $h=0.1$.Since we normalized the direction to unit length $h$ reflects the Aitchison distance between two sequential points.## Changing in direction of the first vertex```{r}paris_pcseq1 <-CoDa_path(comp_direc =c(2,1,1,1),comp_from = paris$PROFCAT,add_opposite =TRUE,step_size = .1) # take the same step size as before```We consider the direction that points to the first vertex of the simplex, in our example, the share of "Upper".The two ternary diagrams below reveal how the PC composition changes along the path defined by this direction.The initial composition observed for Paris is shown as a red cross.One particularity of this direction is that the subcomposition excluding "Other" remains constant along the path.```{r}#| fig-show: hold#| message: false#| fig-pos: hopar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq1[,-4])plot.acomp(paris_pcseq1["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(paris_pcseq1[,-1])plot.acomp(paris_pcseq1["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)save_baseplot(pdf, here("out/figures/","Xe_change_ternary.pdf"), width =7, height =3)par(opar)```Both graphics above make clear that the ratios between all components except "Upper" remain constant.This also explains why the path in the left ternary plot appears linear in the Euclidean sense.Below, we represent the same path as a stacked bar plot, where the ratio of bar heights remains constant except when "Upper" is involved.```{r}#| fig-show: holdnames(paris_pcseq1) <-colnames(PROFCAT)window <-as.character(seq(-40,40,by =2))barplot(acomp(paris_pcseq1[window,,drop=FALSE]), xlab ="value of h")save_baseplot(pdf, here("out/figures","Xe_change_barplot.pdf"))```Another way to look at the changing composition is to present the changes to all components as a function of the share of "Upper".In this graphic, we additionally display the empirical distribution of the share of "Upper" over all municipalities.The vertical line intersects the other in the observed composition for Paris, showing that Paris is an extreme observation in the "Upper" dimension.```{r}#| fig-show: holdparis_pcseq1_gg <-data.frame(paris_pcseq1)colnames(paris_pcseq1_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq1_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()ggsave(here("out/figures/Xe_change_scatter_with_details.pdf"), height =6, width =6)``````{r}#| include: falseparis_pcseq1_gg <-data.frame(paris_pcseq1)colnames(paris_pcseq1_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq1_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()ggsave(here("out/figures/Xe_change_scatter.pdf"), height =6, width =6)```## Changing in a general directionWe repeat the same linear increment procedure for a more general direction.Here, we chose a path for which the share of "Upper" passes through the whole interval $(0,1)$, because we want to show graphics comparable to the previous ones.```{r}dir_gen <-c(.1,.5,.25,.15)paris_pcseq2 <-CoDa_path(comp_direc = dir_gen,comp_from = paris$PROFCAT,add_opposite =TRUE,step_size =attr(paris_pcseq1,"step_size")) # take the same step size as before```An alternative direction is the one that points from the observed point for Paris to the origin of the simplex.The two ternary diagrams below show a path generated along this direction, where the red cross is the empirical composition for Paris and the blue star is the origin.```{r}#| fig-show: "hold"opar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq2[,-4])plot.acomp(paris_pcseq2["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(paris_pcseq2[,-1])plot.acomp(paris_pcseq2["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)save_baseplot(pdf, here("out/figures/","Xu_change_ternary.pdf"), width =7, height =3, bg ="transparent")par(opar)```The simplex-linear path this direction generates looks more complex than the previous one because it is non-linear in the Euclidean sense.For this path, the ratios of the other components are no longer constant, and the changes are generally not monotonic.The lack of monotonicity is evident from the shares of "Other" and "Retired" that peak close to the center of our variation and decline towards both edges.```{r}names(paris_pcseq2) <-colnames(PROFCAT)window <-as.character(seq(-80,80,by =4))barplot(acomp(paris_pcseq2[window,,drop=FALSE]), xlab ="value of h")save_baseplot(pdf, here("out/figures/","Xu_change_barplot.pdf"))```The intersection with the black line is again the empirical composition of Paris.```{r}#| fig-show: holdparis_pcseq2_gg <-data.frame(paris_pcseq2)colnames(paris_pcseq2_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq2_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()ggsave(here("out/figures/Xu_change_scatter_with_details.pdf"), height =6, width =6)``````{r}#| include: falseparis_pcseq2_gg <-data.frame(paris_pcseq2)colnames(paris_pcseq2_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq2_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_vline(xintercept =1/4, col ="grey55") +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()ggsave(here("out/figures/Xu_change_scatter.pdf"), height =6, width =6)```## Changing in direction of the originIf the direction and the starting point coincide, the path will pass through the origin of the simplex.```{r}paris_pcseq3 <-CoDa_path(comp_direc = paris$PROFCAT,comp_from = paris$PROFCAT,add_opposite =TRUE,step_size =attr(paris_pcseq1,"step_size")) # take the same step size as before```The two ternary diagrams below show a path generated along this direction, where the red cross is the empirical composition for Paris and the blue star is the origin.```{r}#| fig-show: "hold"opar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq3[,-4])plot.acomp(paris_pcseq3["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)plot.acomp(paris_pcseq3[,-1])plot.acomp(paris_pcseq3["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)par(opar)```The simplex-linear path generated by this direction is again non-linear in the Euclidean sense and also leads to non-monotonic changes in the shares of "Other" and "Retired".```{r}names(paris_pcseq3) <-colnames(PROFCAT)window <-as.character(seq(-40,40,by =2))barplot(acomp(paris_pcseq3[window,,drop=FALSE]), xlab ="value of h")```When looking at the functional representation of this linear path, we see that the shares for each professional category cross at the origin of the 4-dimensional simplex.The intersection with the black line is again the empirical composition of Paris.```{r}#| fig-show: holdparis_pcseq3_gg <-data.frame(paris_pcseq3)colnames(paris_pcseq3_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq3_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_vline(xintercept =1/4, col ="grey55") +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()```## Changing in the relative compensation directionAnother interesting direction compensates for the relative increase in one component by decreasing another component.Below, we illustrate this movement for the pair "Upper" and "Workers", where we see that the share of "Upper" moves towards its summit when $h$ increases and the share of "Worker moves away from the summit.```{r}#| fig-show: "hold"paris_pcseq4 <-CoDa_path(comp_direc =exp(c(1,-1,0,0)),comp_from = paris$PROFCAT,add_opposite =FALSE, # only draw the line for one sidestep_size = .1) opar <-par(mfrow =c(1,3),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot.acomp(paris_pcseq4[,-4])plot.acomp(paris_pcseq4["0",-4, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)plot.acomp(paris_pcseq4[,-1])plot.acomp(paris_pcseq4["0",-1, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)plot.acomp(paris_pcseq4[,-2])plot.acomp(paris_pcseq4["0",-2, drop =FALSE], col ="red", add =TRUE, pch =3, cex =2)plot.acomp(c(1,1,1), col ="blue", add =TRUE, pch =8, cex =2)par(opar)```One interesting aspect of this path is that it leads to an axial symmetry in the shares of the components "Other" and "Retired".```{r}paris_pcseq4 <-CoDa_path(comp_direc =exp(c(1,-1,0,0)),comp_from = paris$PROFCAT,add_opposite =TRUE,step_size = .1) names(paris_pcseq4) <-colnames(PROFCAT)window <-as.character(seq(-70,70,by =1))barplot(acomp(paris_pcseq4[window,,drop=FALSE]), xlab ="value of h")```The shares of the components "Other" and "Retired" are maximized at the point where those of "Upper" and "Workers" become equal.```{r}equal_point <-which.min(abs(paris_pcseq4$Workers - paris_pcseq4$Upper))focus_seq <-seq(-60,60)plot(x = focus_seq,y = paris_pcseq4$Other[focus_seq + equal_point], type ="l", col ="blue", xlab =NA, ylab =NA)lines(x = focus_seq,y = paris_pcseq4$Retired[focus_seq + equal_point], col ="red")title(xlab ='steps around the point of equality in "Upper" and "Workers"',ylab ='shares of "Other" and "Retired"')```Another interesting point is that in clr space, the components "Other" and "Retired" are unaffected along this path.```{r}HT_dir_comp <-clr(rbind(head(paris_pcseq4, 2), paris_pcseq4["0",], tail(paris_pcseq4, 2)))rownames(HT_dir_comp) <-paste0("compensation direction: ", rownames(HT_dir_comp))HT_dir_vert <-clr(rbind(head(paris_pcseq1, 2), paris_pcseq4["0",], tail(paris_pcseq1, 2)))rownames(HT_dir_vert) <-paste0("vertex direction: ", rownames(HT_dir_vert))rbind(HT_dir_comp, HT_dir_vert)```The functional representation of this linear is shown below.```{r}#| fig-show: holdparis_pcseq4_gg <-data.frame(paris_pcseq4)colnames(paris_pcseq4_gg) <-paste0("PC", 1:4)ggplot(paris_pcseq4_gg) +geom_vline(xintercept = paris$PROFCAT[[1]]) +geom_vline(xintercept =1/4, col ="grey55") +geom_line(aes(x = PC1, y = PC1, col ="1", lty ="1"), lwd = .51) +geom_line(aes(x = PC1, y = PC2, col ="2", lty ="2"), lwd = .51) +geom_line(aes(x = PC1, y = PC3, col ="3", lty ="3"), lwd = .51) +geom_line(aes(x = PC1, y = PC4, col ="4", lty ="4"), lwd = .51) +geom_hline(yintercept =1, col ="black") +geom_boxplot(data =data.frame(PC1 = mun2model$PROFCAT[,1]),mapping =aes(x = PC1, y =1+ .0625),varwidth =TRUE,size = .5,col ="red",alpha = .25,width = .05) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name="Boxplot of % in professional category: Upper")) +scale_y_continuous(labels = scales::percent,breaks =seq(0,1,.25)) +scale_color_manual(breaks =c("1", "2", "3", "4"),values =c("Red", "Orange", "Dark Blue", "Yellow"),name ="professional\ncategory",labels =colnames(PROFCAT)) +scale_linetype_manual(breaks =c("1", "2", "3", "4"),values =1:4,name ="professional\ncategory",labels =colnames(PROFCAT)) +labs(x ="% in professional category: Upper",y ="% in professional category") +coord_equal() +theme_bw()```# Y-scalar model interpretationsIn this section, we look at a model that explains the turnout rate, which we treat as a real variable.```{r}mun2model$TURNOUT <-1- mun2model$VOTE[,6]TURNOUT_model <-lmCoDa( TURNOUT ~log(POP_DENSITY) +ilr(PROFCAT) +1,data = mun2model,weights = REGISTERED)summary(TURNOUT_model)```The analysis of variance is given by```{r}aov <-anova(TURNOUT_model)aov$`Sum Sq`<-round(aov$`Sum Sq`, 2)aov$`Mean Sq`<-round(aov$`Mean Sq`, 2)aov$`F value`<-round(aov$`F value`, 2)kable(aov, caption ="Analysis of variance table for the X-compositional model") |>kable_classic()kable(aov, caption ="Analysis of variance table for the X-compositional model", label ="aov_TURNOUT",format ="latex", booktabs =TRUE) |>kable_styling(latex_options ="hold_position") |>save_kable(file =here("out/tables/aov_TURNOUT.tex"))```## Interpreting the impact of compositional X### Semi-elasticitiesIn this model, the clr transformed coefficients are equal to the semi-elasticity of the shares.```{r}#| fig-show: hold#| warning: falseclr_confint <-confint(TURNOUT_model, "PROFCAT")ggplot(clr_confint) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=X, xend=X, y=0, yend=EST, col = X)) +geom_point(aes(x=X, y = EST, col = X)) +geom_point(aes(x=X, y =`2.5 %`, col = X), pch ="[", stroke =3) +geom_point(aes(x=X, y =`97.5 %`, col = X), pch ="]", stroke =3) +labs(y ="semi-elasticities values", x ="professional category") +theme(legend.position ="none")ggsave(here("out/figures", "yscalar_elasticity_profcat.pdf"), height =2, width =7)```### Semi-elasticities differences and simplex coeffiecientsWe may nonetheless compute the differences in semi-elasticities and also bring the coefficients back into the simplex.```{r}#| warning: falseclr_confint <-confint(TURNOUT_model, "PROFCAT")# to compute the interval of differences we require # the clr covariance matrixvarcov_Y <- TURNOUT_model$trSry$VARCOV_CLR$"TURNOUT"varcov_X <- TURNOUT_model$trSry$VARCOV_CLR$"ilr(PROFCAT)"varcov_b <- varcov_X *as.vector(varcov_Y)clr_b <-coef(TURNOUT_model, space ="clr", split =TRUE)[["ilr(PROFCAT)"]]varcov_diff <- clr_diff <-rep(0, nrow(varcov_b))r <-1# referencefor (i inseq_along(varcov_diff)) { clr_diff[i] <- clr_b[i] - clr_b[r] varcov_diff[i] <- varcov_b[r,r] + varcov_b[i,i] -2* varcov_b[i,r]}# given the sample size we can use normal approximations for to obtain confidence intervalsdiff_confint <- clr_confintdiff_confint$EST <- clr_diffdiff_confint$SD <-sqrt(varcov_diff)diff_confint[["2.5 %"]] <- clr_diff +qnorm(0.025) * diff_confint$SD diff_confint[["97.5 %"]] <- clr_diff +qnorm(0.975) * diff_confint$SD # # we do not have confidence intervals for simplex valued parameterssimplex_confint <- clr_confintsimplex_confint$EST <-exp(clr_b)/sum(exp(clr_b))simplex_confint[["SD"]] <- simplex_confint[["2.5 %"]] <- simplex_confint[["97.5 %"]] <-NAcWay <-"clr / semi-elasticity"dWay <-"difference in clr / se"sWay <-"simplex"threeWays4params <-rbind(cbind(Way = cWay , cen =0 , clr_confint),cbind(Way = dWay , cen =0 , diff_confint),cbind(Way = sWay, cen = .25, simplex_confint))threeWays4params$X <-factor(threeWays4params$X, levels =rev(rownames(clr_b)))tail2 <-nrow(threeWays4params) +c(1,2)threeWays4params <-rbind(threeWays4params, NA)threeWays4params <-rbind(threeWays4params, NA)threeWays4params$EST[tail2] <-c(.35,.15)threeWays4params$Way[tail2] <- sWaythreeWays4params$X2 <- threeWays4params$XthreeWays4params$X2[tail2] <-NAthreeWays4params$X[tail2] <-"Upper"ggplot(threeWays4params) +theme_bw() +coord_flip() +geom_hline(aes(yintercept = cen), col ="grey50", lty ="dotted") +geom_segment(aes(x=X, xend=X, y=cen, yend=EST, col = X2)) +geom_point(aes(x=X, y = EST, col = X2)) +geom_point(aes(x=X, y =`2.5 %`, col = X2), pch ="[", stroke =3) +geom_point(aes(x=X, y =`97.5 %`, col = X2), pch ="]", stroke =3) +scale_color_discrete_qualitative(na.translate =FALSE) +facet_wrap("Way", scales ="free_x") +labs(y ="value", x ="professional\ncategory") +theme(legend.position ="none")ggsave(here("out/figures", "yscalar_1clr2diff3simplex_profcat.pdf"), height =1.65, width =7)```### Variation TableThe table below illustrates how the dependent variable TURNOUT would react to a change in the composition of the professional categories, where we focus on the municipality of Paris.```{r}format_VarTab <-function(VT, digits =4, digitsU =max(digits -2, 0), digitsP = digits, digitsPP = digits) {# Print variation tables in a nicer way VT <-t(VT) VT <-as.data.frame(round(VT, digits)) v <-"Variation in units"if (!is.null( VT[[v]])) VT[[v]] <-round(VT[[v]], digitsU) v <-"Variation in %"if (!is.null( VT[[v]])) VT[[v]] <-paste0(round(VT[[v]], digitsP), " %") v <-"Variation in % points"if (!is.null( VT[[v]])) VT[[v]] <-paste0(round(VT[[v]], digitsPP), " %")t(VT)}``````{r}paris_num <-which(mun2model$ID_MUN == paris_id)VariationTable4PCparis <-function(Xdir) VariationTable(object = TURNOUT_model,Xvar ="PROFCAT",Xdir = Xdir,inc_size = .1,obs = paris_num,normalize_Xdir =FALSE)paris_varTab <-cbind(VariationTable4PCparis("Upper"),VariationTable4PCparis(dir_gen))row.names(paris_varTab)[4] <-"Variation in % points"paris_varTab[4,] <- paris_varTab[5,] *100paris_varTab <- paris_varTab[-5,]tab_atr <-attributes(VariationTable4PCparis("Upper"))colnames(paris_varTab) <-c("Direction to vertex", "General direction")format_VarTab(paris_varTab, 3,digitsP =3, digitsPP =3) |>kable(caption ="Variation of the turnout when changing PC in different directions", digits =4, booktabs =TRUE) |>add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2))) ``````{r}#| include: falseformat_VarTab(paris_varTab, 3,digitsP =2, digitsPP =2) |>kable(caption ="Impacts of change in PC on the turnout", digits =4, booktabs =TRUE,format ="latex",label ="VarTab_TURNOUT_XeXu_change", linesep ="") |># add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2))) |> kable_styling(latex_options ="hold_position") |>save_kable(here("out/tables/VarTab_TURNOUT_XeXu_change.tex"))```### Variation scenarioWe now evaluate, for Paris, how changes in the PC composition would impact the vote shares of each candidate.In the following, we consider scenarios of changes defined by the linear paths illustrated in the previous section.The graphic below shows that all scenarios pass through the observed PC composition for Paris and lead to a predicted turnout of about $72.9\%$.In the hypothetical scenario where Paris has shares in each professional category, the turnout is predicted to drop to about $67\%$.```{r}#| fig-show: hold#| warning: false# first scenario dataparis_pred1 <- paris[rep(1,nrow(paris_pcseq1)),]paris_pred1$PROFCAT <-as.matrix(paris_pcseq1)paris_pred1$VOTE <-NULLparis_pred1$PRED_TURNOUT <-predict(TURNOUT_model, newdata = paris_pred1) paris_pred1$H_SEQ <-row.names(paris_pred1) <-row.names(paris_pcseq1)paris_pred1$F_SHARE <- paris_pred1$PROFCAT[,"Upper"]paris_pred1_gg <-melt(data =data.table(paris_pred1),id.vars =c("H_SEQ","F_SHARE","PRED_TURNOUT"),measure.vars =paste0("PROFCAT.", colnames(PROFCAT)))# second scenario dataparis_pred2 <- paris[rep(1,nrow(paris_pcseq2)),]paris_pred2$PROFCAT <-as.matrix(paris_pcseq2)paris_pred2$VOTE <-NULLparis_pred2$PRED_TURNOUT <-predict(TURNOUT_model, newdata = paris_pred2) paris_pred2$H_SEQ <-row.names(paris_pred2) <-c(row.names(paris_pcseq2))paris_pred2$F_SHARE <- paris_pred2$PROFCAT[,"Upper"]paris_pred2_gg <-melt(data =data.table(paris_pred2),id.vars =c("H_SEQ","F_SHARE","PRED_TURNOUT"),measure.vars =paste0("PROFCAT.", colnames(PROFCAT)))# third scenario dataparis_pred3 <- paris[rep(1,nrow(paris_pcseq3) +1),]paris_pred3$PROFCAT <-rbind(as.matrix(paris_pcseq3),rep(.25,4)) # add the originparis_pred3$VOTE <-NULLparis_pred3$PRED_TURNOUT <-predict(TURNOUT_model, newdata = paris_pred3) paris_pred3$H_SEQ <-row.names(paris_pred3) <-c(row.names(paris_pcseq3),"101")paris_pred3$F_SHARE <- paris_pred3$PROFCAT[,"Upper"]paris_pred3_gg <-melt(data =data.table(paris_pred3),id.vars =c("H_SEQ","F_SHARE", "PRED_TURNOUT"),measure.vars =paste0("PROFCAT.", colnames(PROFCAT)))paris_pred_gg <-rbind(cbind("DIR"="Direction to vertex", paris_pred1_gg),cbind("DIR"="General direction", paris_pred2_gg),cbind("DIR"="Origin direction", paris_pred3_gg))paris_pred_gg[,variable :=str_remove(variable,"PROFCAT.")]paris_pred_gg[,variable :=factor(variable,colnames(PROFCAT))]hline <-data.frame(INT =c(paris_pred1[c("0", "0", "0"), "PRED_TURNOUT"],paris_pred3["101", "PRED_TURNOUT"]),DIR =c("Direction to vertex", "General direction", "Origin direction", "Origin direction"),LAB =c("(Paris)", "(Paris)", "(Paris)", "(origin)"),YYY =c(.85),JJJ =c(-.5,-.5,-.5, 1.5))ggplot(paris_pred_gg) +geom_vline(aes(xintercept = INT), data = hline) +geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data = hline) +geom_line(aes(y = value, x = PRED_TURNOUT, col =variable, lty = variable)) +scale_y_continuous(breaks =seq(0,10)/10,labels = scales::percent) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name =NULL), limits =c(.5,.9)) +coord_flip() +theme_bw() +labs(y ="share in professional category", x ="predicted turnout",col ="professional\ncategory", lty ="professional\ncategory") +facet_wrap("DIR", ncol =1, scales ="free_x")```For the first two scenarios, the same graph becomes:```{r}#| warning: falseggplot(paris_pred_gg[DIR !="Origin direction",]) +geom_vline(aes(xintercept = INT), data = hline[1:2,]) +geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data = hline[1:2,]) +geom_line(aes(y = value, x = PRED_TURNOUT, col =variable, lty = variable)) +scale_y_continuous(breaks =seq(0,10)/10,labels = scales::percent) +scale_x_continuous(labels = scales::percent,sec.axis =dup_axis(name =NULL), limits =c(.5,.9)) +theme_bw() +coord_flip() +labs(y ="share in professional category", x ="predicted turnout",col ="professional\ncategory", lty ="professional\ncategory") +facet_wrap("DIR", ncol =1, scales ="free_x")```An alternative and maybe more straightforward way to show the same information is to look simultaneously at the changes of the dependent and independent variables as a function of $h$.```{r}paris_pred_gg2 <-copy(paris_pred_gg[DIR !="Origin direction",])paris_pred_gg2[, COMPO :="Change in X"]paris_pred_gg3 <-copy(paris_pred_gg2[variable =="Upper",])paris_pred_gg3[, COMPO :="Change in Y"]paris_pred_gg3[, variable :=NA]paris_pred_gg3[,value := PRED_TURNOUT]paris_pred_ggX <-rbind(paris_pred_gg2, paris_pred_gg3)paris_pred_ggX[, COMPO :=factor(COMPO, c("Change in Y", "Change in X"))]paris_pred_ggX[,H_SEQ :=as.integer(H_SEQ)]rm(paris_pred_gg2, paris_pred_gg3)ggplot(paris_pred_ggX[-30<= H_SEQ & H_SEQ <=30]) +geom_vline(xintercept =0) +# geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data = hline[1:2,]) +geom_line(aes(y = value, x = H_SEQ/10, col =variable, lty = variable)) +scale_y_continuous(breaks =seq(0,10)/10,labels = scales::percent) +scale_colour_manual(values = scales::hue_pal()(4), na.value ="#000000", breaks =levels(paris_pred_ggX$variable)) +scale_linetype_manual(values =1:4, na.value =1, , breaks =levels(paris_pred_ggX$variable)) +# scale_x_continuous(labels = scales::percent,sec.axis = dup_axis(name = NULL), limits = c(.5,.9)) +theme_bw() +labs(y ="shares of\nprofessional categories (bottom) and predicted turnout (top)",x ="value of h",col ="professional\ncategory", lty ="professional\ncategory") +facet_grid(COMPO ~ DIR, space ="free_y", scales ="free")ggsave(here("out/figures/x_compo_variation_scatter.pdf"), height =4, width =7)```# Y-compositional model interpretationsIn this section, we look at the case of a Y-compositional model that explains the entire composition of votes.```{r}VOTE_model <-lmCoDa(ilr(VOTE) ~log(POP_DENSITY) +ilr(PROFCAT),data = mun2model)summary(VOTE_model)``````{r}aov <-anova(VOTE_model)aov$`approx F`<-round(aov$`approx F`)aov$Pillai <-round(aov$Pillai, 3)kable(aov, caption ="Analysis of variance table for the y-compositional model") |>kable_classic()kable(aov, caption ="Analysis of variance table for the y-compositional model", label ="aov_VOTE", format ="latex", booktabs =TRUE) |>kable_styling(latex_options ="hold_position") |>save_kable(file =here("out/tables/aov_VOTE.tex"))```## Interpreting the impact of a scalar X### Semi-elasticities differencesThe clr coefficients correspond to a difference from the mean elasticity.```{r}#| fig-show: hold#| warning: falseflevels <-c("-Mean semi elasticity", rev(rename_elec))clr_confint <-confint(VOTE_model, "log(POP_DENSITY)")clr_confint$Y <-factor(clr_confint$Y, flevels)clr_confintB <- clr_confint[1,]clr_confintB$Y <-factor("-Mean semi elasticity", levels = flevels)clr_confintB[,3:6] <-NAclr_T <-coef(VOTE_model, "clr")ME <-as(fitted(VOTE_model, "simplex"),"matrix") %*%t(clr_T[-1,])ME <-melt(data.table(ME), variable.name ="X", value.name ="EST")ME[,Y :=factor("-Mean semi elasticity", levels = flevels)]ggplot(rbind(clr_confint,clr_confintB)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +geom_point(aes(x=Y, y = EST, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +geom_violin(aes(y = EST, x = Y),data = ME[X=="log(POP_DENSITY)",]) +labs(y ="clr parameter values & confidence intervals\ndistribution of negative mean semi elasticity", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "clr_b_popdens_with_details.pdf"), height =2.25, width =7)``````{r}#| include: falseggplot(rbind(clr_confint,clr_confintB)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +geom_point(aes(x=Y, y = EST, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +geom_violin(aes(y = EST, x = Y),data = ME[X=="log(POP_DENSITY)",]) +labs(y ="clr parameter values", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "clr_b_popdens.pdf"), height =1.75, width =7)```We can use them to construct differences between the elasticities of two components of the dependent shares with respect to the same explanatory variable.```{r}#| fig-show: hold#| warning: falseclr_confint <-confint(VOTE_model, "log(POP_DENSITY)",y_ref =1)clr_confint$Y <-factor(clr_confint$Y, flevels)clr_confintB <- clr_confint[1,]clr_confintB[,3:6] <-NAggplot(rbind(clr_confint,clr_confintB)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +geom_point(aes(x=Y, y = DIFF, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +labs(y ="semi-elasticity differences & confidence intervals\n(in the denominator is Macron)", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "SE_diff_popdens_with_details.pdf"), height =2, width =7)``````{r}#| include: falseggplot(rbind(clr_confint,clr_confintB)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +geom_point(aes(x=Y, y = DIFF, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +labs(y ="semi-elasticity differences", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "SE_diff_popdens.pdf"), height =1.75, width =7)```### Variation tableWe can use Taylor approximations to evaluate how the voting behavior reacts to a small change in the number of registered voters.Here, we use increment `log(POP_DENSITY)` by $0.1$, which means that population density increases by about 10%.```{r}paris_num <-which(mun2model$ID_MUN == paris_id)paris_varTab <-VariationTable(VOTE_model, "log(POP_DENSITY)",inc_size = .1, obs = paris_num, Ytotal = mun2model$REGISTERED[paris_num])format_VarTab(paris_varTab, 4, 0, 2, 2) |>kable(caption ="Increase of the population density by 10%", digits =4, booktabs =TRUE)``````{r}format_VarTab(paris_varTab, 4, 0, 2, 2) |>kable(caption ="Increase of the population density by 10\\%", digits =4, booktabs =TRUE,format ="latex", label ="VarTab_VOTE_logPDENS_change", linesep ="") |>kable_styling(latex_options ="hold_position") |>save_kable(here("out/tables/VarTab_VOTE_logPDENS_change.tex"))```It is easy to verify that the variations in % points and units compensate each other.```{r}t(t(round(rowSums(paris_varTab),10)))```### Variation scenariosIn this scenario, the population density changes on a grid that is regular in logs.Even extreme changes such as a tenfold population increase or reduction to one-tenth would lead to little variation.When the population of Paris shrinks to $0.1\%$ of its original size, we see considerable differences.```{r}#| fig-show: holdparis_size_scenario <- paris[rep(1,101),]paris_size_scenario$POP_DENSITY <- paris_size_scenario$POP_DENSITY *exp(seq(-log(1000),log(10),length.out =101))paris_size_scenario$VOTE <-as(predict(VOTE_model,space ="simplex", newdata = paris_size_scenario), "matrix")colnames(paris_size_scenario$VOTE) <-colnames(VOTE)paris_size_scenario_gg <-melt(data =data.table(paris_size_scenario),id.vars ="POP_DENSITY",measure.vars =make.names(paste0("VOTE.",colnames(VOTE))))unmake_votenames <-structure(colnames(VOTE), names =make.names(paste0("VOTE.",colnames(VOTE))))paris_size_scenario_gg[,variable :=factor(unmake_votenames[variable],unmake_votenames)]ggplot(paris_size_scenario_gg) +geom_vline(xintercept = paris$POP_DENSITY) +geom_line(aes(x = POP_DENSITY, y = value, col = variable, lty = variable)) +scale_x_continuous(trans ="log10") +scale_y_continuous(labels = scales::percent) +labs(y ="vote share", x ="population density", col ="candidate\nor block", lty ="candidate\nor block") +theme_bw()ggsave(here("out/figures","y_compo_variation_scatter.pdf"), height =2, width =7)```## Interpreting the impact of an X-compositional variable### Elasticities differenceFor X-compositional variables, the clr coefficients also correspond to differences in mean elasticities, where these means are computed for each component of $X$ over all dependent shares.```{r}#| fig-show: hold#| warning: falseflevels2 <-c("-Mean elasticity",flevels)clr_confint <-confint(VOTE_model, "PROFCAT")clr_confint$X <-factor(clr_confint$X, colnames(PROFCAT))clr_confint$Y <-factor(clr_confint$Y, levels = flevels2)clr_confint_all <- clr_confintclr_confint_all$Y <-factor("-Mean elasticity", flevels2)clr_confint_all$SD <- clr_confint_all$ESTclr_confint_all$EST <-NAclr_confint_all$`2.5 %`<-NAclr_confint_all$`97.5 %`<-NAclr_confint_all <-unique(clr_confint_all)levels(ME$Y)[1] <-"-Mean elasticity"ggplot(rbind(clr_confint,clr_confint_all)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +geom_point(aes(x=Y, y = EST, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +geom_violin(aes(y = EST, x = Y),data = ME[X!="log(POP_DENSITY)"]) +facet_wrap("X") +labs(y ="parameter values & confidence intervals\ndistribution of negative mean elasticity", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "clr_B_profcat_with_details.pdf"), height =4, width =7)``````{r}#| include: falseggplot(rbind(clr_confint,clr_confint_all)) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +geom_point(aes(x=Y, y = EST, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +scale_y_continuous(breaks =seq(-1,1,0.1),minor_breaks =seq(-1,1,0.05)) +geom_violin(aes(y = EST, x = Y),data = ME[X!="log(POP_DENSITY)"]) +facet_grid(~ X, space ="free", scales ="free") +labs(y ="parameter values", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "clr_B_profcat.pdf"), height =2, width =7)``````{r}#| fig-show: hold#| warning: falseclr_confint <-confint(VOTE_model, "PROFCAT",y_ref =1)clr_confint$Y <-factor(clr_confint$Y,levels = flevels)clr_confint$X <-factor(clr_confint$X,levels =colnames(PROFCAT))ggplot(clr_confint) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +geom_point(aes(x=Y, y = DIFF, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +facet_wrap("X") +labs(y ="elasticity differences & confidence intervals\n(in the denominator is Macron)", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "E_diff_profcat_with_details.pdf"), height =3.5, width =7)``````{r}#| include: falseggplot(clr_confint) +theme_bw() +coord_flip() +geom_hline(yintercept =0) +geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +geom_point(aes(x=Y, y = DIFF, col = Y)) +geom_point(aes(x=Y, y =`2.5 %`, col = Y), pch ="[", stroke =3) +geom_point(aes(x=Y, y =`97.5 %`, col = Y), pch ="]", stroke =3) +scale_y_continuous(breaks =seq(-1,1,0.1),minor_breaks =seq(-1,1,0.05)) +facet_grid(~ X, space ="free", scales ="free") +labs(y ="elasticity differences", x =NULL) +theme(legend.position ="none")ggsave(here("out/figures", "E_diff_profcat.pdf"), height =2, width =7)```### Variation tableSimilar to the previous variation table, we evaluate how the voting behavior reacts to a slight change in the explanatory composition.However, since this variable is multivariate, we must specify the change direction.For a change towards the vertex "Upper" we get the following:```{r}VariationTable4PCparis <-function(Xdir) VariationTable(object = VOTE_model,Xvar ="PROFCAT",Xdir = Xdir,inc_size = .1,obs = paris_num,normalize_Xdir =FALSE,Ytotal = mun2model$REGISTERED[paris_num])paris_varTab <-VariationTable4PCparis(Xdir ="Upper")tab_atr <-attributes(paris_varTab)paris_varTab <-format_VarTab(paris_varTab, 3, 0,digitsP =2,digitsPP =2)paris_varTab |>kable(caption ="Change of (PC) in direction of the vertex \"Upper\"", digits =4, booktabs =TRUE) |>add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2)[1])) ``````{r}#| include: falseparis_varTab |>kable(caption ="Change of PC in direction of the vertex \"Upper\"", digits =4, booktabs =TRUE,format ="latex", label ="VarTab_VOTE_Xe_change", linesep ="") |>kable_styling(latex_options ="hold_position") |># add_footnote(sprintf("h = %s", tab_atr$inc_size)) |> save_kable(here("out/tables/VarTab_VOTE_Xe_change.tex"))```For the general direction, we get the following:```{r}paris_varTab <-VariationTable4PCparis(dir_gen)paris_varTab <-format_VarTab(paris_varTab, 3, 0,digitsP =2,digitsPP =2)paris_varTab |>kable(caption ="Change of PC in a general direction", digits =4, booktabs =TRUE)``````{r}#| include: falseparis_varTab |>kable(caption ="Change in a general direction", digits =4, booktabs =TRUE ,format ="latex", label ="VarTab_VOTE_Xu_change", linesep ="") |>kable_styling(latex_options ="hold_position") |>save_kable(file =here("out/tables/VarTab_VOTE_Xu_change.tex"))```### Variation scenariosLet us turn to the changes induced by the variation of the socio-professional composition in Paris.Here, we show the changes in the explanatory composition on top of the changes in the predicted shares.The graphic below displays the changes as a function of the category "Upper".```{r}paris_pred1$VOTE <-as(predict(VOTE_model, space ="simplex", newdata = paris_pred1),"matrix")paris_pred2$VOTE <-as(predict(VOTE_model, space ="simplex", newdata = paris_pred2),"matrix")unmake_pc_vote_names <-c(structure(paste0("(Vote) ", unmake_votenames), names =names(unmake_votenames)),structure(paste0("(PC) ", colnames(PROFCAT)), names =paste0("PROFCAT.", colnames(PROFCAT))))legend_title <-"candidate or block\n(top six)\n\nprofessional category\n(bottom four)"paris_pred1_gg <-melt(data =data.table(paris_pred1),id.vars =c("H_SEQ","F_SHARE","PRED_TURNOUT"),measure.vars =names(unmake_pc_vote_names))paris_pred2_gg <-melt(data =data.table(paris_pred2),id.vars =c("H_SEQ","F_SHARE","PRED_TURNOUT"),measure.vars =names(unmake_pc_vote_names))paris_pred_gg <-rbind(cbind("DIR"="Direction to vertex", paris_pred1_gg),cbind("DIR"="General direction", paris_pred2_gg))paris_pred_gg[,COMPO :=as.character(variable)]paris_pred_gg[,SAHRE_XY :=factor(unmake_pc_vote_names[COMPO], unmake_pc_vote_names)]paris_pred_gg[,COMPO :=fcase(grepl("^VOTE.", variable), "Changes in Y",grepl("^PROFCAT.", variable), "Changes in X")]paris_pred_gg[,COMPO :=factor(COMPO, levels =c("Changes in Y", "Changes in X"))]hline <-data.frame(INT =c(paris$PROFCAT[c(1,1), "Upper"],.25),DIR =c("Direction to vertex","General direction", "General direction"))pclim <-c(.05, .95)ggplot(paris_pred_gg) +geom_vline(aes(xintercept = INT), data = hline[1:2,]) +geom_line(aes(y = value, x = F_SHARE, col =SAHRE_XY, lty = SAHRE_XY)) +scale_y_continuous(breaks =seq(1,9, by =2)/10, labels = scales::percent) +scale_x_continuous(breaks =seq(1,9, by =2)/10, labels = scales::percent) +theme_bw() +labs(y ="shares of\nprofessional categories (bottom) and predicted votes (top)",x ="share of professional category Upper",col = legend_title, lty = legend_title) +facet_grid(rows =vars(COMPO), cols =vars(DIR), shrink =TRUE, scales ="free_y")ggsave(here("out/figures", "yx_compo_variation_scatter.pdf"), height =4.5, width =7)```We can draw the same graphic as a function of the value of $h$.```{r}ggplot(paris_pred_gg[,]) +geom_vline(aes(xintercept = B), data =cbind(hline[1:2,],B =0)) +geom_line(aes(y = value, x =as.integer(H_SEQ)/10, col =SAHRE_XY, lty = SAHRE_XY)) +scale_y_continuous(breaks =seq(1,9, by =2)/10, labels = scales::percent) +# scale_x_continuous(breaks = seq(1,9, by = 2)/10, labels = scales::percent) +theme_bw() +labs(y ="shares of\nprofessional categories (bottom) and predicted votes (top)",x ="value of h",col = legend_title, lty = legend_title) +facet_grid(rows =vars(COMPO), cols =vars(DIR), shrink =TRUE, scales ="free_y", space ="free_y")ggsave(here("out/figures", "yx_compo_variation_scatter_by_h.pdf"), height =4.5, width =7)```We could also illustrate this scenario in a matrix of ternary diagrams.However, in this representation, we are limited to subcompositions of three components.```{r}#| fig-show: holdopar <-par(mfrow =c(2,2), mai =c(.1,0,.2,0))filter1 <- paris_pred1$H_SEQ[between(paris_pred1$F_SHARE, pclim[1], pclim[2])]filter1 <-as.numeric(filter1)filter1 <-min(max(-filter1), max(filter1))filter1 <-intersect(paris_pred1$H_SEQ, seq(-filter1,filter1))filter1 <-as.character(seq(-25,25))colors1 <-diverging_hcl(length(filter1),palette ="Blue-Red-3")plot(acomp(paris_pred1$PROFCAT[filter1,1:3]), col = colors1, pch =19)title("Change X (direction to vertex)")plot(acomp(paris_pred2$PROFCAT[filter1,1:3]), col = colors1, pch =19)title("Change X (general direction)")plot(acomp(paris_pred1$VOTE[filter1,1:3]), col = colors1, pch =19)plot(acomp(paris_pred2$VOTE[filter1,1:3]), col = colors1, pch =19)par(opar)``````{r}#| include: falseopar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot(acomp(paris_pred1$PROFCAT[filter1,1:3]), col = colors1, pch =19)plot(acomp(paris_pred1$VOTE[filter1,1:3]), col = colors1, pch =19)save_baseplot(pdf, here("out/figures", "yx_compo_variation_ternary_Xe.pdf"), height =3, width =7, bg ="transparent")opar <-par(mfrow =c(1,2),mai =c(0,.6,0,.6), oma =c(0,0,0,0), mar =c(2,3,0,3))plot(acomp(paris_pred2$PROFCAT[filter1,1:3]), col = colors1, pch =19)plot(acomp(paris_pred2$VOTE[filter1,1:3]), col = colors1, pch =19)save_baseplot(pdf, here("out/figures", "yx_compo_variation_ternary_Xu.pdf"), height =3, width =7, bg ="transparent")par(opar)```### Direction of maximal share ratio variationFor each dependent share ratio, the direction of maximal variation is given by the squared norm of the difference vector for two rows of the parameter matrix $B$, where the row indexes correspond to the indexes of the two components that appear in the share ratio.With this in mind we can examine how the share ratio react to a change in their respective maximal variation direction.```{r}B <-t(coef(VOTE_model,space ="clr", split =TRUE)[["ilr(PROFCAT)"]])Dy <-nrow(B)max_diffs <-expand.grid(yj =rownames(B), yl =rownames(B))max_diffs <- max_diffs[as.integer(max_diffs$yj) >as.integer(max_diffs$yl),]max_diffs[["diff"]] <-unlist(Map(function(yj, yl) sqrt(crossprod(B[as.integer(yj),] - B[as.integer(yl),])),max_diffs$yj,max_diffs$yl))MatMaxDiff <-matrix(nrow = Dy, ncol = Dy, dimnames =list(rownames(B),rownames(B)))MatMaxDiff[lower.tri(MatMaxDiff)] <- max_diffs$diffggplot(max_diffs, aes(y = yl, x = yj)) +geom_tile(aes(fill = diff)) +geom_text(aes(label =round(diff,3), size =abs(diff)),fontface ="bold") +scale_size_continuous(range =c(3,4), guide ="none") +scale_x_discrete(position ="top") +scale_y_discrete(limits=rev) +scale_fill_continuous_sequential("Reds") +labs(x ="", y ="") +guides(fill =guide_colorbar(barwidth = .5, barheight =10,title ="value")) +theme_minimal() +theme(axis.text.x =element_text(angle =15)) +coord_equal()ggsave(here("out/figures", "yx_compo_maxdiff_matrix.pdf"), height =3, width =4.5)```The above shows that the professional categories is most useful to discriminate Le Pens share from that of the others.Let us now look at the variation scenario in the direction of maximal variation for the shares of the "Left" and "Le Pen", which is the strongest opposition in our example.Additionally, we look at "No Vote" vs "Le Pen", which is the weakest opposition that invloves "Le Pen" and "Right vs Mélenchon", which is the overall weakest opposition.```{r}Mdiff_VS <-function(den ="Left", num ="Le Pen") { max_diff_dir <-exp(B[num,] - B[den,]) max_diff_dir <- max_diff_dir/sum(max_diff_dir) max_diff_scenario <-VariationScenario(VOTE_model,Xvar ="PROFCAT", Xdir = max_diff_dir) max_diff_scenario$H_SEQ <-rownames(max_diff_scenario) max_diff_scenario$VOTE <- max_diff_scenario$Y max_diff_scenario$PROFCAT <- max_diff_scenario$X max_diff_scenario$SR <- max_diff_scenario$Y[,num] /max_diff_scenario$Y[,den] unmake_names <-paste0(num, " / ", den) unmake_names <-"(SR) Share ratio" unmake_names <-c("SR"= unmake_names, unmake_pc_vote_names) max_diff_scenario_gg <-melt(data =data.table(max_diff_scenario[,c("H_SEQ","VOTE","PROFCAT","SR")]),id.vars =c("H_SEQ"),measure.vars =names(unmake_names)) max_diff_scenario_gg <-cbind(max_diff_scenario_gg, "DIR"=paste0(num, " vs. ", den)) max_diff_scenario_gg[,COMPO :=as.character(variable)] max_diff_scenario_gg[,SAHRE_XY :=factor(unmake_names[COMPO], unmake_names)] max_diff_scenario_gg[,COMPO :=fcase( variable =="SR", "Share ratio",grepl("^VOTE.", variable), "Changes in Y",grepl("^PROFCAT.", variable), "Changes in X")] max_diff_scenario_gg[,COMPO :=factor(COMPO, levels =c("Share ratio", "Changes in Y", "Changes in X"))] max_diff_scenario_gg}max_diff_scenario_gg <-rbind(Mdiff_VS(den ="Left", num ="Le Pen"),Mdiff_VS(den ="No vote", num ="Le Pen"),Mdiff_VS(num ="Mélenchon", den ="Right"))legend_title3 <-"share ratio (SR)\n\ncandidate or block (Vote)\n\nprofessional category (PC)"ggplot(max_diff_scenario_gg) +geom_vline(aes(xintercept =0)) +geom_line(aes(y = value, x =as.integer(H_SEQ)/10, col =SAHRE_XY, lty = SAHRE_XY)) +scale_colour_discrete_qualitative() +theme_bw() +labs(y ="X-shares (bottom), Y-shares (middle), Y-share ratio (top)",x ="value of h",col = legend_title3, lty = legend_title3) +facet_grid(rows =vars(COMPO), cols =vars(DIR), scales ="free") +theme(legend.key.width =unit(3, "line"))ggsave(here("out/figures", "yx_compo_variation_scatter_by_h_maxdiff.pdf"), height =4.5, width =7)```The direction of changes in X can be easily computed from the parameter matrix:```{r}clrInv2 <-function(x) exp(x)/sum(exp(x))uStar <-function(u) round(clrInv2(u/sqrt(sum(u^2))),3)rbind(data.frame("Y-ratio"="Le Pen vs Left", t(uStar(B["Le Pen", ] - B["Left",]))),data.frame("Y-ratio"="Le Pen vs No Vote", t(uStar(B["Le Pen", ] - B["No vote",]))),data.frame("Y-ratio"="Mélenchon vs Right", t(uStar(B["Mélenchon",] - B["Right", ])))) ```### Share ratio elasticity tablesAnother way to interpret covariate impacts in XY-compositional models is to consider share ratio elasticities.These measure the % change in the ratio of two components of Y relative to the % change in the ratio of two components of X.```{r}sre_matrix_plot <-function(sre_tab, psize =5, bin_centers) { sre_tab$Yr <-paste0(as.character(sre_tab$Yj), " / ",as.character(sre_tab$Yl)) sre_tab$Xr <-paste0(as.character(sre_tab$Xj), " / ",as.character(sre_tab$Xl)) sre_tab$Yr <-factor(sre_tab$Yr, (unique(sre_tab$Yr))) sre_tab$Xr <-factor(sre_tab$Xr, rev(unique(sre_tab$Xr))) Dy <-length(unique(sre_tab$Yj)) Dx <-length(unique(sre_tab$Xj)) Xblocks <-seq(Dx-1) * (Dx -1) + .5 Yblocks <-seq(Dy-1) * (Dy -1) + .5if (missing(bin_centers)) { p <-ggplot(sre_tab) +geom_point(aes(x = Yr, y = Xr, col = SRE, pch = SRE >0), size = psize) +coord_fixed() +geom_hline(yintercept = Xblocks) +geom_vline(xintercept = Yblocks) +scale_color_continuous_diverging(na.value ="white", rev =TRUE) +scale_shape_manual(breaks =c(T,F), values =c(18,20), labels =c("postive", "negative")) +labs(x ="share ratios of Y", y ="share ratios of X", col ="share ratio\nelasticity", shape ="sign") +theme_bw() +guides(color =guide_colorbar(order =1)) +theme(axis.text.x =element_text(angle =90,hjust =1, vjust = .5),legend.justification =c(0,1)) }if (!missing(bin_centers)) { sre_tab$SREB <- sre_tab$SRE sre_tab$SREB[!is.finite(sre_tab$SREB)] <-NAif (is.null(bin_centers)) { breaks <-max(abs(sre_tab$SREB), na.rm =TRUE) breaks <-seq(from =-breaks, to = breaks, length.out =12) bin_centers <- breaks[-1] -diff(breaks)/2 } if (is.numeric(bin_centers)) { bin_centers <-sort(unique(bin_centers)) breaks <- bin_centers[-1] +diff(bin_centers)/2 breaks <-c(bin_centers[1:2] -diff(bin_centers[1:3])/2, breaks) check <-"Bins should contain minimum and maximum SRE values: %s"if (min(breaks) >min(sre_tab$SREB, na.rm =TRUE)) stop(sprintf(check, range(sre_tab$SREB, na.rm =TRUE)))if (max(breaks) <max(sre_tab$SREB, na.rm =TRUE)) stop(sprintf(check, range(sre_tab$SREB, na.rm =TRUE))) } sre_tab$SREB <-cut(sre_tab$SREB,breaks,labels = bin_centers, include.lowest =TRUE) nbins <-length(bin_centers) nnegs <-sum(bin_centers <0) bin_cols <-diverge_hcl(palette ="Blue-Red 3", n = nbins, rev =TRUE) bin_cols[nnegs +1] <-"grey" bin_sizes <-sqrt(abs(seq(-psize, psize, length.out = nbins))) +1 bin_sizes <- bin_sizes/max(bin_sizes)*psize bin_shapes <-unlist(Map("rep",c(20,18),c(nbins - nnegs,nnegs))) LT <-"share ratio\nelasticity\n± " LT <-paste0(LT,(bin_centers[2] - bin_centers[1])/2) p <-ggplot(sre_tab) +geom_point(aes(x = Yr, y = Xr, col = SREB, pch = SREB, size = SREB)) +coord_fixed() +geom_hline(yintercept = Xblocks) +geom_vline(xintercept = Yblocks) +scale_color_manual(breaks =rev(bin_centers), values = bin_cols, na.value ="white", drop=FALSE) +scale_size_manual( breaks =rev(bin_centers), values = bin_sizes, drop=FALSE) +scale_shape_manual(breaks =rev(bin_centers), values = bin_shapes, drop=FALSE) +labs(x ="share ratios of Y", y ="share ratios of X",col = LT, shape = LT, size = LT) +theme_bw() +guides(color =guide_legend()) +theme(axis.text.x =element_text(angle =-75,hjust =1, vjust = .5),legend.justification =c(0,0)) +scale_x_discrete(position ="top") }return(p)}```#### Fixed vertex directionsThe four tabs below illustrate the share ratio elasticities for the case where the direction is fixed towards one of the vertices of the PC composition.The blank rows in the graphic are because some share ratios do not change for this particular direction.::: {.panel-tabset}##### "Upper"```{r}#| warning: falsesre_tab <-ShareRatioElasticities(VOTE_model,Xvar ="PROFCAT", Xdir ="Upper")sre_matrix_plot(sre_tab, bin_centers =seq(-175,175,length.out =11)/1000)```##### "Workers"```{r}#| warning: falsesre_tab <-ShareRatioElasticities(VOTE_model,Xvar ="PROFCAT", Xdir ="Workers")sre_matrix_plot(sre_tab, bin_centers =seq(-10,10, by =2)*0.031)```##### "Other"```{r}#| warning: falsesre_tab <-ShareRatioElasticities(VOTE_model,Xvar ="PROFCAT", Xdir ="Other")sre_matrix_plot(sre_tab, bin_centers =seq(-10,10, by =2)*0.0085)```##### "Retired"```{r}#| warning: falsesre_tab <-ShareRatioElasticities(VOTE_model,Xvar ="PROFCAT", Xdir ="Retired")sre_matrix_plot(sre_tab, bin_centers =seq(-135,135,length.out =11)/1000)```:::#### Varying Compensating directionsWhile the previous charts used the same direction for all share ratio elasticities, we can also change them as a function of the components used in the share ratio of X.The compensating direction is particularly interesting for this purpose because the changes in the Y composition are entirely explained by changes in the two compensating components of the X composition.```{r}#| fig-show: holdsre_tab <-ShareRatioElasticities(VOTE_model,Xvar ="PROFCAT")sre_matrix_plot(sre_tab, bin_centers =seq(-10,10,by =2)*.0225)ggsave(here("out/figures/yx_compo_share_ratio_matrix.pdf"),width =7, height =3.5)```The values of the share-ratio elasticities shown above can be understood as a decomposition of the original clr parameters.More precisely, it is possible to show that, within any of the 24 blocks summing over all 15 elasticity values, we obtain $0.5 D_y D_x B^*$, a matrix proportional to the clr parameter values.```{r}B <-coef(VOTE_model,"clr",split =TRUE)[["ilr(PROFCAT)"]]B_df <-data.table(B,keep.rownames =TRUE)B_df <-melt(B_df,id.vars ="rn")B_df$Y <-factor(B_df$variable,levels =colnames(B))B_df$X <-factor(B_df$rn,levels =rev(rownames(B)))setDT(B_df)setDT(sre_tab)B_df2 <- sre_tab[,list(value2 =sum(SRE) /ncol(B) /nrow(B) *2), by =c("Yl", "Xl")]B_df2[B_df, value := i.value, on =c("Yl"="Y", "Xl"="X")]B_df2```The visual impression blow of parameter matrix $B$, confirms the link to the more detailed share ratio representation above.```{r}#| fig-show: holdggplot(B_df, aes(y = X, x = Y)) +geom_tile(aes(fill = value)) +geom_text(aes(label =round(value,3), size =abs(value)),fontface ="bold") +scale_size_continuous(range =c(3,4), guide ="none") +scale_x_discrete(position ="top") +scale_fill_continuous_diverging(palette ="Blue-Red 3") +labs(x ="", y ="") +guides(fill =guide_colorbar(barwidth = .5, barheight =15)) +theme_minimal() +coord_equal()ggsave(here("out/figures/clr_B_matrix_image.pdf"))```# Acknowledgements {.unnumbered}The example used to illustrate this work is taken from a Statistical Consulting project class of the Master in Data Science for Social Sciences of The Toulouse School of Economics, in cooperation with the Market Research agency BVA.The project was a great experience, and we want to thank all the participants, namely Olivier Hennebelle and Alejandro Lara, who advised the project from the side of BVA, and our four students, Claire Lebrun, Malo Bert, Kyllian James and Gael Charrier.